Total least squares: Difference between revisions
Fgnievinski (talk | contribs) |
|||
(245 intermediate revisions by more than 100 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Statistical technique}} |
|||
{{Mergeto|Robust regression|date=February 2007}} |
|||
[[Image:Total least squares.svg|right|thumb|200px| The bivariate (Deming regression) case of total least squares. The red lines show the error in both ''x'' and ''y''. This is different from the traditional least squares method which measures error parallel to the ''y'' axis. The case shown, with deviations measured perpendicularly, arises when errors in ''x'' and ''y'' have equal variances.]] |
|||
{{Regression bar}} |
|||
In [[applied statistics]], '''total least squares''' is a type of [[errors-in-variables regression]], a [[least squares]] data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generalization of [[Deming regression]] and also of [[orthogonal regression]], and can be applied to both linear and non-linear models. |
|||
'''Errors-in-variables (EIV)''' is a robust modeling technique in [[statistics]], which assumes that every variable can have error or noise. Errors-in-variables is also referred to as '''total least squares''' (TLS), in a broad sense, in the literature of computational mathematics and engineering. However, TLS in a strict sense implies the application of EIV or orthogonal regression to a linear model <math>\mathbf{A x} = \mathbf{b}</math>. |
|||
The total least squares approximation of the data is generically equivalent to the best, in the [[Frobenius norm]], [[low-rank approximation]] of the data matrix.<ref>I. Markovsky and [[Sabine Van Huffel|S. Van Huffel]], ''Overview of total least squares methods.'' Signal Processing, vol. 87, pp. 2283–2302, 2007. [https://eprints.soton.ac.uk/263855/1/tls_overview.pdf preprint]</ref> |
|||
== Robust linear regression == |
|||
In [[linear regression]], the [[least squares]] (LS) attributes all [[errors and residuals in statistics|error]] to the dependent variables. It has variant versions according to other error configurations, including total least squares (i.e. orthogonal error), data least squares (DLS), and constrained or structured TLS. |
|||
== Linear model == |
|||
Given an observation vector <math>\mathbf{b} \in \reals^n</math> and a data [[matrix (mathematics)|matrix]] <math>\mathbf{A} \in \reals^{n \times m}</math>, consider the solution of the [[overdetermined system]] of equations |
|||
===Background=== |
|||
:<math>\mathbf{Ax \approx b}</math>. |
|||
In the [[least squares]] method of data modeling, the [[objective function]] to be minimized, ''S'', is a [[Quadratic form (statistics)|quadratic form]]: |
|||
The ordinary least squares method (OLS) yields the solution <math>\mathbf{x}</math> that minimizes the [[Euclidean norm]] of the [[errors and residuals in statistics|residuals]] <math>\|{\mathbf{Ax-b}}\|_2</math>, where <math>\|\cdot\|_2</math> is also known as the two-norm. The residual is an estimate of the [[errors and residuals in statistics|error]]. |
|||
:<math>S=\mathbf{r^TWr},</math> |
|||
Equivalently, the OLS problem can be paraphrased by |
|||
where ''r'' is the vector of [[errors and residuals in statistics|residuals]] and ''W'' is a weighting matrix. In [[linear least squares (mathematics)|linear least squares]] the model contains equations which are linear in the parameters appearing in the parameter vector <math>\boldsymbol\beta</math>, so the residuals are given by |
|||
:<math>\mathbf{r=y-X\boldsymbol\beta}.</math> |
|||
There are ''m'' observations in '''y''' and ''n'' parameters in '''β''' with ''m''>''n''. '''X''' is a ''m''×''n'' matrix whose elements are either constants or functions of the independent variables, '''x'''. The weight matrix '''W''' is, ideally, the inverse of the [[variance-covariance matrix]] <math>\mathbf M_y</math> of the observations '''y'''. The independent variables are assumed to be error-free. The parameter estimates are found by setting the gradient equations to zero, which results in the normal equations |
|||
<ref group="note">An alternative form is <math>\mathbf{X^TWX\boldsymbol\Delta \boldsymbol\beta=X^T W \boldsymbol\Delta y}</math>, where <math>\boldsymbol\Delta \boldsymbol\beta</math> is the parameter shift from some starting estimate of <math>\boldsymbol\beta</math> and <math>\boldsymbol\Delta \mathbf y</math> is the difference between '''y''' and the value calculated using the starting value of <math>\boldsymbol\beta</math></ref> |
|||
:<math>\mathbf{X^TWX\boldsymbol\beta=X^T Wy}.</math> |
|||
===Allowing observation errors in all variables=== |
|||
:<math> \min_{\mathbf{x}}\|\Delta\mathbf{b}\|_2 \quad |
|||
\mbox{ subject to }\quad \mathbf{Ax}=\mathbf{b}+\Delta\mathbf{b}. </math> |
|||
Now, suppose that both '''x''' and '''y''' are observed subject to error, with variance-covariance matrices <math>\mathbf M_x</math> and <math>\mathbf M_y</math> respectively. In this case the objective function can be written as |
|||
If the data matrix <math>\mathbf{A}</math> is also noisy (i.e. error in both the dependent and the explanatory variables), the OLS solution is no longer optimal. In cases where orthogonal optimization is acceptable, TLS offers a proper formulation: |
|||
:<math>S=\mathbf{r_x^TM_x^{-1}r_x+r_y^TM_y^{-1}r_y},</math> |
|||
where <math>\mathbf r_x</math> and <math>\mathbf r_y</math> are the residuals in '''x''' and '''y''' respectively. Clearly{{Explain|date=February 2021}} these residuals cannot be independent of each other, but they must be constrained by some kind of relationship. Writing the model function as <math>\mathbf{f(r_x,r_y,\boldsymbol\beta)}</math>, the constraints are expressed by ''m'' condition equations.<ref>W.E. Deming, Statistical Adjustment of Data, Wiley, 1943</ref> |
|||
:<math>\mathbf{F=\Delta y -\frac{\partial f}{\partial r_x} r_x-\frac{\partial f}{\partial r_y} r_y -X\Delta\boldsymbol\beta=0}.</math> |
|||
:<math> \min_{\mathbf{x}} \|{[{\Delta\mathbf{A}\,\Delta\mathbf{b}}]}\|_F \quad |
|||
Thus, the problem is to minimize the objective function subject to the ''m'' constraints. It is solved by the use of [[Lagrange multipliers]]. After some algebraic manipulations,<ref>{{cite book |last=Gans |first=Peter |title=Data Fitting in the Chemical Sciences |year=1992 |publisher=Wiley |isbn=9780471934127 |url=http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471934127.html |access-date=4 December 2012}}</ref> the result is obtained. |
|||
\mbox{ subject to }\quad (\mathbf{A}+\Delta\mathbf{A})\mathbf{x}=\mathbf{b}+\Delta\mathbf{b},</math> |
|||
:<math>\mathbf{X^TM^{-1}X\Delta \boldsymbol\beta=X^T M^{-1} \Delta y}, </math> |
|||
where <math>\|{\cdot}\|_F</math> is the [[Frobenius norm]] (or colloquially the "length" of the vector); and the perturbations <math>\Delta\mathbf{A}</math> and <math>\Delta\mathbf{b}</math> are used to compensate for the noisy signals <math>\mathbf{A}</math> and <math>\mathbf{b}</math>, respectively. This formulation of TLS also implies that the noises are assumed to be independent and identically distributed (<i>i.i.d.</i>) both in <math>\mathbf{A}</math> and <math>\mathbf{b}</math>. Note that the objective can have a weighting matrix according to the distribution of errors if the distribution is known or well-estimated, which is called the constrained or structured TLS. |
|||
or alternatively <math>\mathbf{X^TM^{-1}X \boldsymbol\beta=X^T M^{-1} y},</math> |
|||
In the other case, where the noise is only in <math>\mathbf{A}</math>, DLS can be used alternatively as |
|||
where '''M''' is the variance-covariance matrix relative to both independent and dependent variables. |
|||
:<math>\mathbf{M=K_xM_xK_x^T+K_yM_yK_y^T;\ K_x=-\frac{\partial f}{\partial r_x},\ K_y=-\frac{\partial f}{\partial r_y}}.</math> |
|||
=== Example === |
|||
:<math> \min_{\mathbf{x}} \|{[{\Delta\mathbf{A}}]}\|_F \quad \mbox{ subject to } \quad (\mathbf{A}+\Delta\mathbf{A})\mathbf{x}=\mathbf{b}.</math> |
|||
When the data errors are uncorrelated, all matrices '''M''' and '''W''' are diagonal. Then, take the example of straight line fitting. |
|||
The solution of the OLS problem can be obtained by using the (pseudo-)inverse of the data [[Matrix (mathematics)|matrix]]. Solutions to the TLS and DLS problems have been shown to be closely connected to a set of singular vectors of the (augmented) system-related matrix corresponding to the minimum singular value. |
|||
:<math>f(x_i,\beta)=\alpha + \beta x_i</math> |
|||
in this case |
|||
:<math>M_{ii}=\sigma^2_{y,i}+\beta^2 \sigma^2_{x,i}</math> |
|||
showing how the variance at the ''i''th point is determined by the variances of both independent and dependent variables and by the model being used to fit the data. The expression may be generalized by noting that the parameter <math>\beta</math> is the slope of the line. |
|||
:<math>M_{ii}=\sigma^2_{y,i}+\left(\frac{dy}{dx}\right)^2_i \sigma^2_{x,i}</math> |
|||
An expression of this type is used in fitting [[Determination of equilibrium constants#Parameter errors and correlation|pH titration data]] where a small error on ''x'' translates to a large error on y when the slope is large. |
|||
== References == |
|||
* S. V. Huffel and P. Lemmerling, ''Total Least Squares and Errors-in-Variables Modeling: Analysis, Algorithms and Applications''. Dordrecht, The Netherlands: Kluwer Academic Publishers, 2002. |
|||
* S. Jo and S. W. Kim, "Consistent normalized least mean square filtering with noisy data matrix," ''IEEE Trans. Signal Processing'', vol. 53, no. 6, pp. 2112-2123, Jun. 2005. |
|||
* R. D. DeGroat and E. M. Dowling, "The data least squares problem and channel equalization," ''IEEE Trans. Signal Processing'', vol. 41, no. 1, pp. 407–411, Jan. 1993. |
|||
* T. Abatzoglou and J. Mendel, "Constrained total least squares," in ''Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP’87)'', Apr. 1987, vol. 12, pp. 1485–1488. |
|||
=== Algebraic point of view === |
|||
[[Category:Statistics]] |
|||
As was shown in 1980 by Golub and Van Loan, the TLS problem does not have a solution in general.<ref>G. H. Golub and C. F. Van Loan, An analysis of the total least squares problem. Numer. Anal., 17, 1980, pp. 883–893.</ref> The following considers the simple case where a unique solution exists without making any particular assumptions. |
|||
The computation of the TLS using [[singular value decomposition]] (SVD) is described in standard texts.<ref>{{Cite book |
|||
| last1=Golub |first1=Gene H. |author-link1=Gene H. Golub |
|||
| last2=Van Loan |first2=Charles F. |author-link2=Charles F. Van Loan |
|||
| title = Matrix Computations |
|||
| edition = 3rd |
|||
| publisher = [[The Johns Hopkins University Press]] |
|||
| year = 1996 |
|||
}} pp 596.</ref> We can solve the equation |
|||
:<math>XB \approx Y</math> |
|||
for ''B'' where ''X'' is ''m''-by-''n'' and ''Y'' is ''m''-by-''k''. <ref group="note">The notation ''XB'' ≈ ''Y'' is used here to reflect the notation used in the earlier part of the article. In the computational literature the problem has been more commonly presented as ''AX'' ≈ ''B'', i.e. with the letter ''X'' used for the ''n''-by-''k'' matrix of unknown regression coefficients.</ref> |
|||
That is, we seek to find ''B'' that minimizes error matrices ''E'' and ''F'' for ''X'' and ''Y'' respectively. That is, |
|||
:<math>\mathrm{argmin}_{B,E,F} \| [E\; F] \|_F, \qquad (X+E) B = Y+F</math> |
|||
where <math>[E\; F]</math> is the [[augmented matrix]] with ''E'' and ''F'' side by side and <math>\|\cdot\|_F</math> is the [[Frobenius norm]], the square root of the sum of the squares of all entries in a matrix and so equivalently the square root of the sum of squares of the lengths of the rows or columns of the matrix. |
|||
This can be rewritten as |
|||
:<math>[(X+E) \; (Y+F)] \begin{bmatrix} B\\ -I_k\end{bmatrix} = 0.</math> |
|||
where <math>I_k</math> is the <math>k\times k</math> identity matrix. |
|||
The goal is then to find <math>[E\; F]</math> that reduces the rank of <math>[X\; Y]</math> by ''k''. Define <math>[U] [\Sigma] [V]^*</math> to be the singular value decomposition of the augmented matrix <math>[X\; Y]</math>. |
|||
:<math>[X\; Y] = [U_X\; U_Y] \begin{bmatrix}\Sigma_X &0 \\ 0 & \Sigma_Y\end{bmatrix}\begin{bmatrix}V_{XX} & V_{XY} \\ V_{YX} & V_{YY}\end{bmatrix}^* = [U_X\; U_Y] \begin{bmatrix}\Sigma_X &0 \\ 0 & \Sigma_Y\end{bmatrix} \begin{bmatrix} V_{XX}^* & V_{YX}^* \\ V_{XY}^* & V_{YY}^*\end{bmatrix}</math> |
|||
where ''V'' is partitioned into blocks corresponding to the shape of ''X'' and ''Y''. |
|||
Using the [[Eckart–Young theorem]], the approximation minimising the norm of the error is such that matrices <math>U</math> and <math>V</math> are unchanged, while the smallest <math>k</math> singular values are replaced with zeroes. That is, we want |
|||
:<math>[(X+E)\; (Y+F)] = [U_X\; U_Y] \begin{bmatrix}\Sigma_X &0 \\ 0 & 0_{k\times k}\end{bmatrix}\begin{bmatrix}V_{XX} & V_{XY} \\ V_{YX} & V_{YY}\end{bmatrix}^*</math> |
|||
so by linearity, |
|||
:<math>[E\; F] = -[U_X\; U_Y] \begin{bmatrix}0_{n\times n} &0 \\ 0 & \Sigma_Y\end{bmatrix}\begin{bmatrix}V_{XX} & V_{XY} \\ V_{YX} & V_{YY}\end{bmatrix}^*. </math> |
|||
We can then remove blocks from the ''U'' and Σ matrices, simplifying to |
|||
:<math>[E\; F] = -U_Y\Sigma_Y \begin{bmatrix}V_{XY}\\V_{YY}\end{bmatrix}^*= -[X\; Y] \begin{bmatrix}V_{XY}\\V_{YY}\end{bmatrix}\begin{bmatrix}V_{XY}\\ V_{YY}\end{bmatrix}^*.</math> |
|||
This provides ''E'' and ''F'' so that |
|||
:<math>[(X+E) \; (Y+F)] \begin{bmatrix}V_{XY}\\ V_{YY}\end{bmatrix} = 0.</math> |
|||
Now if <math>V_{YY}</math> is nonsingular, which is not always the case (note that the behavior of TLS when <math>V_{YY}</math> is singular is not well understood yet), we can then right multiply both sides by <math>-V_{YY}^{-1}</math> to bring the bottom block of the right matrix to the negative identity, giving<ref>Bjõrck, Ake (1996) ''Numerical Methods for Least Squares Problems'', Society for Industrial and Applied Mathematics. {{ISBN|978-0898713602}} {{page needed|date=June 2012}}</ref> |
|||
: <math>[(X+E) \; (Y+F)] \begin{bmatrix} -V_{XY} V_{YY}^{-1} \\ -V_{YY} V_{YY}^{-1}\end{bmatrix} = [(X+E) \; (Y+F)] \begin{bmatrix} B\\ -I_k\end{bmatrix} = 0 ,</math> |
|||
and so |
|||
:<math>B=-V_{XY} V_{YY}^{-1}.</math> |
|||
A naive [[GNU Octave]] implementation of this is: |
|||
<syntaxhighlight lang="matlab"> |
|||
function B = tls(X, Y) |
|||
[m n] = size(X); % n is the width of X (X is m by n) |
|||
Z = [X Y]; % Z is X augmented with Y. |
|||
[U S V] = svd(Z, 0); % find the SVD of Z. |
|||
VXY = V(1:n, 1+n:end); % Take the block of V consisting of the first n rows and the n+1 to last column |
|||
VYY = V(1+n:end, 1+n:end); % Take the bottom-right block of V. |
|||
B = -VXY / VYY; |
|||
end |
|||
</syntaxhighlight> |
|||
The way described above of solving the problem, which requires that the matrix <math>V_{YY}</math> is nonsingular, can be slightly extended by the so-called ''classical TLS algorithm''.<ref>[[Sabine Van Huffel|S. Van Huffel]] and J. Vandewalle (1991) ''The Total Least Squares Problems: Computational Aspects and Analysis''. SIAM Publications, Philadelphia PA.</ref> |
|||
=== Computation === |
|||
The standard implementation of classical TLS algorithm is available through [http://www.netlib.org/vanhuffel/index.html Netlib], see also.<ref>[[Sabine Van Huffel|S. Van Huffel]], Documented Fortran 77 programs of the extended classical total least squares algorithm, the partial singular value decomposition algorithm and the partial total least squares algorithm, Internal Report ESAT-KUL 88/1, ESAT Lab., Dept. of Electrical Engineering, Katholieke Universiteit Leuven, 1988.</ref><ref>[[Sabine Van Huffel|S. Van Huffel]], The extended classical total least squares algorithm, J. Comput. Appl. Math., 25, pp. 111–119, 1989.</ref> All modern implementations based, for example, on solving a sequence of ordinary least squares problems, approximate the matrix <math>B</math> (denoted <math>X</math> in the literature), as introduced by [[Sabine Van Huffel|Van Huffel]] and Vandewalle. It is worth noting, that this <math>B</math> is, however, ''not the TLS solution'' in many cases.<ref>M. Plešinger, The Total Least Squares Problem and Reduction of Data in AX ≈ B. Doctoral Thesis, TU of Liberec and Institute of Computer Science, AS CR Prague, 2008. Ph.D. Thesis</ref><ref>I. Hnětynková, M. Plešinger, D. M. Sima, Z. Strakoš, and [[Sabine Van Huffel|S. Van Huffel]], The total least squares problem in AX ≈ B. A new classification with the relationship to the classical works. SIMAX vol. 32 issue 3 (2011), pp. 748–770.</ref> |
|||
== Non-linear model == |
|||
For [[non-linear least squares|non-linear systems]] similar reasoning shows that the normal equations for an iteration cycle can be written as |
|||
:<math>\mathbf{J^TM^{-1}J\Delta \boldsymbol\beta=J^T M^{-1} \Delta y}, </math> |
|||
where <math>\mathbf{J}</math> is the [[Jacobian matrix and determinant|Jacobian matrix]]. |
|||
== Geometrical interpretation == |
|||
{{main|Curve fitting#Algebraic fit versus geometric fit for curves}} |
|||
{{further|Orthogonal regression}} |
|||
When the independent variable is error-free a residual represents the "vertical" distance between the observed data point and the fitted curve (or surface). In total least squares a residual represents the distance between a data point and the fitted curve measured along some direction. In fact, if both variables are measured in the same units and the errors on both variables are the same, then the residual represents the [[distance from a point to a line|shortest distance between the data point and the fitted curve]], that is, the residual vector is perpendicular to the tangent of the curve. For this reason, this type of regression is sometimes called ''two dimensional Euclidean regression'' (Stein, 1983)<ref>{{cite journal |last=Stein |first=Yaakov J. |title=Two Dimensional Euclidean Regression |url =http://www.dspcsp.com/pubs/euclreg.pdf }}</ref> or ''orthogonal regression''. |
|||
== Scale invariant methods == |
|||
A serious difficulty arises if the variables are not measured in the same units. First consider measuring distance between a data point and the line: what are the measurement units for this distance? If we consider measuring distance based on Pythagoras' Theorem then it is clear that we shall be adding quantities measured in different units, which is meaningless. Secondly, if we rescale one of the variables e.g., measure in grams rather than kilograms, then we shall end up with different results (a different line). To avoid these problems it is sometimes suggested that we convert to dimensionless variables—this may be called normalization or standardization. However, there are various ways of doing this, and these lead to fitted models which are not equivalent to each other. One approach is to normalize by known (or estimated) measurement precision thereby minimizing the [[Mahalanobis distance]] from the points to the line, providing a [[maximum-likelihood]] solution;{{Citation needed|date=July 2009}} the unknown precisions could be found via [[analysis of variance]]. |
|||
In short, total least squares does not have the property of units-invariance—i.e. it is not [[scale invariance|scale invariant]]. For a meaningful model we require this property to hold. A way forward is to realise that residuals (distances) measured in different units can be combined if multiplication is used instead of addition. Consider fitting a line: for each data point the product of the vertical and horizontal residuals equals twice the area of the triangle formed by the residual lines and the fitted line. We choose the line which minimizes the sum of these areas. Nobel laureate [[Paul Samuelson]] proved in 1942 that, in two dimensions, it is the only line expressible solely in terms of the ratios of standard deviations and the correlation coefficient which (1) fits the correct equation when the observations fall on a straight line, (2) exhibits scale invariance, and (3) exhibits invariance under interchange of variables.<ref>{{cite journal |last=Samuelson |first=Paul A. |year=1942 |title=A Note on Alternative Regressions |journal=Econometrica |doi=10.2307/1907024 |jstor=1907024 |volume=10 |issue=1 |pages=80–83}}</ref> This solution has been rediscovered in different disciplines and is variously known as '''standardised major axis''' (Ricker 1975, Warton et al., 2006),<ref>{{cite journal |last=Ricker |first=W. E. |year=1975 |title=A note concerning Professor Jolicoeur's Comments |journal=Journal of the Fisheries Research Board of Canada |doi=10.1139/f75-172 |volume=32 |issue=8 |pages=1494–1498}}</ref><ref>{{cite journal |last1=Warton |first1=David I. |last2=Wright |first2=Ian J. |last3=Falster |first3=Daniel S. |last4=Westoby |first4=Mark |year=2006 |title=Bivariate line-fitting methods for allometry |journal=Biological Reviews |doi=10.1017/S1464793106007007 |volume=81 |issue=2 |pages=259–291|pmid=16573844 |citeseerx=10.1.1.461.9154 |s2cid=16462731 }}</ref> the '''reduced major axis''', the '''geometric mean functional relationship''' (Draper and Smith, 1998),<ref>Draper, NR and Smith, H. ''Applied Regression Analysis'', 3rd edition, pp. 92–96. 1998</ref> '''least products regression''', '''diagonal regression''', '''line of organic correlation''', and the '''least areas line''' (Tofallis, 2002).<ref>{{cite book |last=Tofallis |first=Chris |editor1-last=Van Huffel |editor1-first=Sabine |editor1-link= Sabine Van Huffel |editor2-last=Lemmerling |editor2-first=P. |year=2002 |title=Total Least Squares and Errors-in-Variables Modeling: Analysis, Algorithms and Applications |chapter=Model Fitting for Multiple Variables by Minimising the Geometric Mean Deviation |publisher=Kluwer Academic Publ. |location=Dordrecht |isbn=978-1402004766 |ssrn=1077322}}</ref> |
|||
Tofallis (2015, 2023)<ref>{{ Cite SSRN |last=Tofallis|first=Chris|year=2015|title=Fitting Equations to Data with the Perfect Correlation Relationship |ssrn=2707593}}</ref><ref>Tofallis, C. (2023). Fitting an Equation to Data Impartially. Mathematics, 11(18), 3957. https://ssrn.com/abstract=4556739 https://doi.org/10.3390/math11183957</ref> has extended this approach to deal with multiple variables. The calculations are simpler than for total least squares as they only require knowledge of covariances, and can be computed using standard spreadsheet functions. |
|||
== See also == |
|||
* [[Regression dilution]] |
|||
* [[Deming regression]], a special case with two predictors and independent errors. |
|||
* [[Errors-in-variables model]] |
|||
* [[Gauss-Helmert model]] |
|||
* [[Linear regression]] |
|||
* [[Least squares]] |
|||
* [[Principal component analysis]] |
|||
* [[Principal component regression]] |
|||
==Notes== |
|||
{{reflist|group=note}} |
|||
==References== |
|||
{{Reflist}} |
|||
===Others=== |
|||
* I. Hnětynková, M. Plešinger, D. M. Sima, Z. Strakoš, and [[Sabine Van Huffel|S. Van Huffel]], ''The total least squares problem in AX ≈ B. A new classification with the relationship to the classical works.'' SIMAX vol. 32 issue 3 (2011), pp. 748–770. Available as a [https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2010/2010-38_fp.pdf preprint]. |
|||
* M. Plešinger, ''The Total Least Squares Problem and Reduction of Data in AX ≈ B.'' Doctoral Thesis, TU of Liberec and Institute of Computer Science, AS CR Prague, 2008. [https://www.fp.tul.cz/~plesinger/my_publications/doctoral_thesis/thesis.pdf Ph.D. Thesis] |
|||
* C. C. Paige, Z. Strakoš, ''Core problems in linear algebraic systems.'' SIAM J. Matrix Anal. Appl. 27, 2006, pp. 861–875. {{doi|10.1137/040616991}} |
|||
* [[Sabine Van Huffel|S. Van Huffel]] and P. Lemmerling, ''Total Least Squares and Errors-in-Variables Modeling: Analysis, Algorithms and Applications''. Dordrecht, The Netherlands: Kluwer Academic Publishers, 2002. |
|||
* S. Jo and S. W. Kim, ''Consistent normalized least mean square filtering with noisy data matrix.'' IEEE Trans. Signal Process., vol. 53, no. 6, pp. 2112–2123, Jun. 2005. |
|||
* R. D. DeGroat and E. M. Dowling, ''The data least squares problem and channel equalization.'' IEEE Trans. Signal Process., vol. 41, no. 1, pp. 407–411, Jan. 1993. |
|||
* [[Sabine Van Huffel|S. Van Huffel]] and J. Vandewalle, ''The Total Least Squares Problems: Computational Aspects and Analysis.'' SIAM Publications, Philadelphia PA, 1991. {{doi|10.1137/1.9781611971002}} |
|||
* T. Abatzoglou and J. Mendel, ''Constrained total least squares'', in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP’87), Apr. 1987, vol. 12, pp. 1485–1488. |
|||
* P. de Groen ''An introduction to total least squares'', in Nieuw Archief voor Wiskunde, Vierde serie, deel 14, 1996, pp. 237–253 [https://arxiv.org/abs/math.RA/9805076/ arxiv.org]. |
|||
* G. H. Golub and C. F. Van Loan, ''An analysis of the total least squares problem.'' SIAM J. on Numer. Anal., 17, 1980, pp. 883–893. {{doi|10.1137/0717073}} |
|||
* [http://www.mathpages.com/home/kmath110.htm Perpendicular Regression Of A Line] at MathPages |
|||
* A. R. Amiri-Simkooei and S. Jazaeri ''Weighted total least squares formulated by standard least squares theory'', in Journal of Geodetic Science, 2 (2): 113–124, 2012 [http://engold.ui.ac.ir/~amiri/JGS_Amiri_Jazaeri_2012.pdf]. |
|||
{{Least Squares and Regression Analysis}} |
|||
{{DEFAULTSORT:Total Least Squares}} |
|||
[[Category:Applied mathematics]] |
|||
[[Category:Curve fitting]] |
|||
[[Category:Least squares]] |
|||
[[Category:Regression models]] |
Latest revision as of 16:34, 28 October 2024
Part of a series on |
Regression analysis |
---|
Models |
Estimation |
Background |
In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generalization of Deming regression and also of orthogonal regression, and can be applied to both linear and non-linear models.
The total least squares approximation of the data is generically equivalent to the best, in the Frobenius norm, low-rank approximation of the data matrix.[1]
Linear model
[edit]Background
[edit]In the least squares method of data modeling, the objective function to be minimized, S, is a quadratic form:
where r is the vector of residuals and W is a weighting matrix. In linear least squares the model contains equations which are linear in the parameters appearing in the parameter vector , so the residuals are given by
There are m observations in y and n parameters in β with m>n. X is a m×n matrix whose elements are either constants or functions of the independent variables, x. The weight matrix W is, ideally, the inverse of the variance-covariance matrix of the observations y. The independent variables are assumed to be error-free. The parameter estimates are found by setting the gradient equations to zero, which results in the normal equations [note 1]
Allowing observation errors in all variables
[edit]Now, suppose that both x and y are observed subject to error, with variance-covariance matrices and respectively. In this case the objective function can be written as
where and are the residuals in x and y respectively. Clearly[further explanation needed] these residuals cannot be independent of each other, but they must be constrained by some kind of relationship. Writing the model function as , the constraints are expressed by m condition equations.[2]
Thus, the problem is to minimize the objective function subject to the m constraints. It is solved by the use of Lagrange multipliers. After some algebraic manipulations,[3] the result is obtained.
or alternatively where M is the variance-covariance matrix relative to both independent and dependent variables.
Example
[edit]When the data errors are uncorrelated, all matrices M and W are diagonal. Then, take the example of straight line fitting.
in this case
showing how the variance at the ith point is determined by the variances of both independent and dependent variables and by the model being used to fit the data. The expression may be generalized by noting that the parameter is the slope of the line.
An expression of this type is used in fitting pH titration data where a small error on x translates to a large error on y when the slope is large.
Algebraic point of view
[edit]As was shown in 1980 by Golub and Van Loan, the TLS problem does not have a solution in general.[4] The following considers the simple case where a unique solution exists without making any particular assumptions.
The computation of the TLS using singular value decomposition (SVD) is described in standard texts.[5] We can solve the equation
for B where X is m-by-n and Y is m-by-k. [note 2]
That is, we seek to find B that minimizes error matrices E and F for X and Y respectively. That is,
where is the augmented matrix with E and F side by side and is the Frobenius norm, the square root of the sum of the squares of all entries in a matrix and so equivalently the square root of the sum of squares of the lengths of the rows or columns of the matrix.
This can be rewritten as
where is the identity matrix. The goal is then to find that reduces the rank of by k. Define to be the singular value decomposition of the augmented matrix .
where V is partitioned into blocks corresponding to the shape of X and Y.
Using the Eckart–Young theorem, the approximation minimising the norm of the error is such that matrices and are unchanged, while the smallest singular values are replaced with zeroes. That is, we want
so by linearity,
We can then remove blocks from the U and Σ matrices, simplifying to
This provides E and F so that
Now if is nonsingular, which is not always the case (note that the behavior of TLS when is singular is not well understood yet), we can then right multiply both sides by to bring the bottom block of the right matrix to the negative identity, giving[6]
and so
A naive GNU Octave implementation of this is:
function B = tls(X, Y)
[m n] = size(X); % n is the width of X (X is m by n)
Z = [X Y]; % Z is X augmented with Y.
[U S V] = svd(Z, 0); % find the SVD of Z.
VXY = V(1:n, 1+n:end); % Take the block of V consisting of the first n rows and the n+1 to last column
VYY = V(1+n:end, 1+n:end); % Take the bottom-right block of V.
B = -VXY / VYY;
end
The way described above of solving the problem, which requires that the matrix is nonsingular, can be slightly extended by the so-called classical TLS algorithm.[7]
Computation
[edit]The standard implementation of classical TLS algorithm is available through Netlib, see also.[8][9] All modern implementations based, for example, on solving a sequence of ordinary least squares problems, approximate the matrix (denoted in the literature), as introduced by Van Huffel and Vandewalle. It is worth noting, that this is, however, not the TLS solution in many cases.[10][11]
Non-linear model
[edit]For non-linear systems similar reasoning shows that the normal equations for an iteration cycle can be written as
where is the Jacobian matrix.
Geometrical interpretation
[edit]When the independent variable is error-free a residual represents the "vertical" distance between the observed data point and the fitted curve (or surface). In total least squares a residual represents the distance between a data point and the fitted curve measured along some direction. In fact, if both variables are measured in the same units and the errors on both variables are the same, then the residual represents the shortest distance between the data point and the fitted curve, that is, the residual vector is perpendicular to the tangent of the curve. For this reason, this type of regression is sometimes called two dimensional Euclidean regression (Stein, 1983)[12] or orthogonal regression.
Scale invariant methods
[edit]A serious difficulty arises if the variables are not measured in the same units. First consider measuring distance between a data point and the line: what are the measurement units for this distance? If we consider measuring distance based on Pythagoras' Theorem then it is clear that we shall be adding quantities measured in different units, which is meaningless. Secondly, if we rescale one of the variables e.g., measure in grams rather than kilograms, then we shall end up with different results (a different line). To avoid these problems it is sometimes suggested that we convert to dimensionless variables—this may be called normalization or standardization. However, there are various ways of doing this, and these lead to fitted models which are not equivalent to each other. One approach is to normalize by known (or estimated) measurement precision thereby minimizing the Mahalanobis distance from the points to the line, providing a maximum-likelihood solution;[citation needed] the unknown precisions could be found via analysis of variance.
In short, total least squares does not have the property of units-invariance—i.e. it is not scale invariant. For a meaningful model we require this property to hold. A way forward is to realise that residuals (distances) measured in different units can be combined if multiplication is used instead of addition. Consider fitting a line: for each data point the product of the vertical and horizontal residuals equals twice the area of the triangle formed by the residual lines and the fitted line. We choose the line which minimizes the sum of these areas. Nobel laureate Paul Samuelson proved in 1942 that, in two dimensions, it is the only line expressible solely in terms of the ratios of standard deviations and the correlation coefficient which (1) fits the correct equation when the observations fall on a straight line, (2) exhibits scale invariance, and (3) exhibits invariance under interchange of variables.[13] This solution has been rediscovered in different disciplines and is variously known as standardised major axis (Ricker 1975, Warton et al., 2006),[14][15] the reduced major axis, the geometric mean functional relationship (Draper and Smith, 1998),[16] least products regression, diagonal regression, line of organic correlation, and the least areas line (Tofallis, 2002).[17]
Tofallis (2015, 2023)[18][19] has extended this approach to deal with multiple variables. The calculations are simpler than for total least squares as they only require knowledge of covariances, and can be computed using standard spreadsheet functions.
See also
[edit]- Regression dilution
- Deming regression, a special case with two predictors and independent errors.
- Errors-in-variables model
- Gauss-Helmert model
- Linear regression
- Least squares
- Principal component analysis
- Principal component regression
Notes
[edit]- ^ An alternative form is , where is the parameter shift from some starting estimate of and is the difference between y and the value calculated using the starting value of
- ^ The notation XB ≈ Y is used here to reflect the notation used in the earlier part of the article. In the computational literature the problem has been more commonly presented as AX ≈ B, i.e. with the letter X used for the n-by-k matrix of unknown regression coefficients.
References
[edit]- ^ I. Markovsky and S. Van Huffel, Overview of total least squares methods. Signal Processing, vol. 87, pp. 2283–2302, 2007. preprint
- ^ W.E. Deming, Statistical Adjustment of Data, Wiley, 1943
- ^ Gans, Peter (1992). Data Fitting in the Chemical Sciences. Wiley. ISBN 9780471934127. Retrieved 4 December 2012.
- ^ G. H. Golub and C. F. Van Loan, An analysis of the total least squares problem. Numer. Anal., 17, 1980, pp. 883–893.
- ^ Golub, Gene H.; Van Loan, Charles F. (1996). Matrix Computations (3rd ed.). The Johns Hopkins University Press. pp 596.
- ^ Bjõrck, Ake (1996) Numerical Methods for Least Squares Problems, Society for Industrial and Applied Mathematics. ISBN 978-0898713602 [page needed]
- ^ S. Van Huffel and J. Vandewalle (1991) The Total Least Squares Problems: Computational Aspects and Analysis. SIAM Publications, Philadelphia PA.
- ^ S. Van Huffel, Documented Fortran 77 programs of the extended classical total least squares algorithm, the partial singular value decomposition algorithm and the partial total least squares algorithm, Internal Report ESAT-KUL 88/1, ESAT Lab., Dept. of Electrical Engineering, Katholieke Universiteit Leuven, 1988.
- ^ S. Van Huffel, The extended classical total least squares algorithm, J. Comput. Appl. Math., 25, pp. 111–119, 1989.
- ^ M. Plešinger, The Total Least Squares Problem and Reduction of Data in AX ≈ B. Doctoral Thesis, TU of Liberec and Institute of Computer Science, AS CR Prague, 2008. Ph.D. Thesis
- ^ I. Hnětynková, M. Plešinger, D. M. Sima, Z. Strakoš, and S. Van Huffel, The total least squares problem in AX ≈ B. A new classification with the relationship to the classical works. SIMAX vol. 32 issue 3 (2011), pp. 748–770.
- ^ Stein, Yaakov J. "Two Dimensional Euclidean Regression" (PDF).
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ Samuelson, Paul A. (1942). "A Note on Alternative Regressions". Econometrica. 10 (1): 80–83. doi:10.2307/1907024. JSTOR 1907024.
- ^ Ricker, W. E. (1975). "A note concerning Professor Jolicoeur's Comments". Journal of the Fisheries Research Board of Canada. 32 (8): 1494–1498. doi:10.1139/f75-172.
- ^ Warton, David I.; Wright, Ian J.; Falster, Daniel S.; Westoby, Mark (2006). "Bivariate line-fitting methods for allometry". Biological Reviews. 81 (2): 259–291. CiteSeerX 10.1.1.461.9154. doi:10.1017/S1464793106007007. PMID 16573844. S2CID 16462731.
- ^ Draper, NR and Smith, H. Applied Regression Analysis, 3rd edition, pp. 92–96. 1998
- ^ Tofallis, Chris (2002). "Model Fitting for Multiple Variables by Minimising the Geometric Mean Deviation". In Van Huffel, Sabine; Lemmerling, P. (eds.). Total Least Squares and Errors-in-Variables Modeling: Analysis, Algorithms and Applications. Dordrecht: Kluwer Academic Publ. ISBN 978-1402004766. SSRN 1077322.
- ^ Tofallis, Chris (2015). "Fitting Equations to Data with the Perfect Correlation Relationship". SSRN 2707593.
- ^ Tofallis, C. (2023). Fitting an Equation to Data Impartially. Mathematics, 11(18), 3957. https://ssrn.com/abstract=4556739 https://doi.org/10.3390/math11183957
Others
[edit]- I. Hnětynková, M. Plešinger, D. M. Sima, Z. Strakoš, and S. Van Huffel, The total least squares problem in AX ≈ B. A new classification with the relationship to the classical works. SIMAX vol. 32 issue 3 (2011), pp. 748–770. Available as a preprint.
- M. Plešinger, The Total Least Squares Problem and Reduction of Data in AX ≈ B. Doctoral Thesis, TU of Liberec and Institute of Computer Science, AS CR Prague, 2008. Ph.D. Thesis
- C. C. Paige, Z. Strakoš, Core problems in linear algebraic systems. SIAM J. Matrix Anal. Appl. 27, 2006, pp. 861–875. doi:10.1137/040616991
- S. Van Huffel and P. Lemmerling, Total Least Squares and Errors-in-Variables Modeling: Analysis, Algorithms and Applications. Dordrecht, The Netherlands: Kluwer Academic Publishers, 2002.
- S. Jo and S. W. Kim, Consistent normalized least mean square filtering with noisy data matrix. IEEE Trans. Signal Process., vol. 53, no. 6, pp. 2112–2123, Jun. 2005.
- R. D. DeGroat and E. M. Dowling, The data least squares problem and channel equalization. IEEE Trans. Signal Process., vol. 41, no. 1, pp. 407–411, Jan. 1993.
- S. Van Huffel and J. Vandewalle, The Total Least Squares Problems: Computational Aspects and Analysis. SIAM Publications, Philadelphia PA, 1991. doi:10.1137/1.9781611971002
- T. Abatzoglou and J. Mendel, Constrained total least squares, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP’87), Apr. 1987, vol. 12, pp. 1485–1488.
- P. de Groen An introduction to total least squares, in Nieuw Archief voor Wiskunde, Vierde serie, deel 14, 1996, pp. 237–253 arxiv.org.
- G. H. Golub and C. F. Van Loan, An analysis of the total least squares problem. SIAM J. on Numer. Anal., 17, 1980, pp. 883–893. doi:10.1137/0717073
- Perpendicular Regression Of A Line at MathPages
- A. R. Amiri-Simkooei and S. Jazaeri Weighted total least squares formulated by standard least squares theory, in Journal of Geodetic Science, 2 (2): 113–124, 2012 [1].