Jump to content

Lanczos algorithm: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
m Separated paragraphs
 
(17 intermediate revisions by 15 users not shown)
Line 1: Line 1:
{{Short description|Numerical eigenvalue calculation}}
{{for|the null space-finding algorithm|block Lanczos algorithm}}
{{for|the null space-finding algorithm|block Lanczos algorithm}}
{{for|the interpolation method|Lanczos resampling}}
{{for|the interpolation method|Lanczos resampling}}
{{for|the approximation of the gamma function|Lanczos approximation}}
{{for|the approximation of the gamma function|Lanczos approximation}}


The '''Lanczos algorithm''' is a [[iterative method|iterative method]] devised by [[Cornelius Lanczos]] that is an adaptation of [[power iteration|power methods]] to find the <math>m</math> "most useful" (tending towards extreme highest/lowest) [[eigenvalues and eigenvectors]] of an <math>n \times n</math> [[Hermitian matrix]], where <math> m </math> is often but not necessarily much smaller than <math> n </math>.<ref>{{cite journal |last=Lanczos |first=C. |title=An iteration method for the solution of the eigenvalue problem of linear differential and integral operators |journal= Journal of Research of the National Bureau of Standards|volume=45 |issue= 4|pages=255–282 |year=1950 |url=https://www.cs.umd.edu/~oleary/lanczos1950.pdf |doi=10.6028/jres.045.026 |doi-access=free }}</ref> Although computationally efficient in principle, the method as initially formulated was not useful, due to its [[Numerical stability|numerical instability]].
The '''Lanczos algorithm''' is an [[iterative method]] devised by [[Cornelius Lanczos]] that is an adaptation of [[power iteration|power methods]] to find the <math>m</math> "most useful" (tending towards extreme highest/lowest) [[eigenvalues and eigenvectors]] of an <math>n \times n</math> [[Hermitian matrix]], where <math> m </math> is often but not necessarily much smaller than <math> n </math>.<ref>{{cite journal |last=Lanczos |first=C. |title=An iteration method for the solution of the eigenvalue problem of linear differential and integral operators |journal= Journal of Research of the National Bureau of Standards|volume=45 |issue= 4|pages=255–282 |year=1950 |url=https://www.cs.umd.edu/~oleary/lanczos1950.pdf |doi=10.6028/jres.045.026 |doi-access=free }}</ref> Although computationally efficient in principle, the method as initially formulated was not useful, due to its [[Numerical stability|numerical instability]].


In 1970, Ojalvo and Newman showed how to make the method numerically stable and applied it to the solution of very large engineering structures subjected to dynamic loading.<ref name=":0">{{cite journal |last1=Ojalvo |first1=I. U. |last2=Newman |first2=M. |title=Vibration modes of large structures by an automatic matrix-reduction method |journal=[[AIAA Journal]] |volume=8 |issue=7 |pages=1234–1239 |year=1970 |doi=10.2514/3.5878 |bibcode=1970AIAAJ...8.1234N }}</ref> This was achieved using a method for purifying the Lanczos vectors (i.e. by repeatedly reorthogonalizing each newly generated vector with '''all''' previously generated ones)<ref name=":0" /> to any degree of accuracy, which when not performed, produced a series of vectors that were highly contaminated by those associated with the lowest natural frequencies.
In 1970, Ojalvo and Newman showed how to make the method numerically stable and applied it to the solution of very large engineering structures subjected to dynamic loading.<ref name=":0">{{cite journal |last1=Ojalvo |first1=I. U. |last2=Newman |first2=M. |title=Vibration modes of large structures by an automatic matrix-reduction method |journal=[[AIAA Journal]] |volume=8 |issue=7 |pages=1234–1239 |year=1970 |doi=10.2514/3.5878 |bibcode=1970AIAAJ...8.1234N }}</ref> This was achieved using a method for purifying the Lanczos vectors (i.e. by repeatedly reorthogonalizing each newly generated vector with '''all''' previously generated ones)<ref name=":0" /> to any degree of accuracy, which when not performed, produced a series of vectors that were highly contaminated by those associated with the lowest natural frequencies.
Line 34: Line 35:
0 & & & & \beta_m & \alpha_m \\
0 & & & & \beta_m & \alpha_m \\
\end{pmatrix}</math>.
\end{pmatrix}</math>.
:'''Note''' <math> A v_j = w_j' = \beta_{j+1} v_{j+1} + \alpha_j v_j + \beta_j v_{j-1} </math> for <math> 1 < j < m </math>.
:'''Note''' <math> A v_j = w_j' = \beta_{j+1} v_{j+1} + \alpha_j v_j + \beta_j v_{j-1} </math> for <math> 2 < j < m </math>.


There are in principle four ways to write the iteration procedure. Paige and other works show that the above order of operations is the most numerically stable.<ref name="CW1985">{{Cite book|last1=Cullum |last2= Willoughby|title=Lanczos Algorithms for Large Symmetric Eigenvalue Computations|volume= 1| isbn= 0-8176-3058-9}}</ref><ref name="Saad1992">{{Cite book|author=Yousef Saad|title=Numerical Methods for Large Eigenvalue Problems| isbn= 0-470-21820-7|url= http://www-users.cs.umn.edu/~saad/books.html|author-link=Yousef Saad|date=1992-06-22}}</ref>
There are in principle four ways to write the iteration procedure. Paige and other works show that the above order of operations is the most numerically stable.<ref name="CW1985">{{Cite book|last1=Cullum |last2= Willoughby|title=Lanczos Algorithms for Large Symmetric Eigenvalue Computations|year= 1985|volume= 1| isbn= 0-8176-3058-9}}</ref><ref name="Saad1992">{{Cite book|author=Yousef Saad|title=Numerical Methods for Large Eigenvalue Problems| isbn= 0-470-21820-7|url= http://www-users.cs.umn.edu/~saad/books.html|author-link=Yousef Saad|date=1992-06-22}}</ref>
In practice the initial vector <math>v_1</math> may be taken as another argument of the procedure, with <math>\beta_j=0</math> and indicators of numerical imprecision being included as additional loop termination conditions.
In practice the initial vector <math>v_1</math> may be taken as another argument of the procedure, with <math>\beta_j=0</math> and indicators of numerical imprecision being included as additional loop termination conditions.


Line 45: Line 46:


=== Application to the eigenproblem ===
=== Application to the eigenproblem ===
The Lanczos algorithm is most often brought up in the context of finding the [[eigenvalue]]s and [[eigenvector]]s of a matrix, but whereas an ordinary [[Matrix diagonalization|diagonalization of a matrix]] would make eigenvectors and eigenvalues apparent from inspection, the same is not true for the tridiagonalization performed by the Lanczos algorithm; nontrivial additional steps are needed to compute even a single eigenvalue or eigenvector. Nonetheless, applying the Lanczos algorithm is often a significant step forward in computing the eigendecomposition. If <math>\lambda</math> is an eigenvalue of <math>A</math>, and if <math> T x = \lambda x </math> (<math>x</math> is an eigenvector of <math>T</math>) then <math> y = V x </math> is the corresponding eigenvector of <math>A</math> (since <math> A y = A V x = V T V^* V x = V T I x = V T x = V (\lambda x) = \lambda V x = \lambda y</math>). Thus the Lanczos algorithm transforms the eigendecomposition problem for <math>A</math> into the eigendecomposition problem for <math>T</math>.
The Lanczos algorithm is most often brought up in the context of finding the [[eigenvalue]]s and [[eigenvector]]s of a matrix, but whereas an ordinary [[Matrix diagonalization|diagonalization of a matrix]] would make eigenvectors and eigenvalues apparent from inspection, the same is not true for the tridiagonalization performed by the Lanczos algorithm; nontrivial additional steps are needed to compute even a single eigenvalue or eigenvector. Nonetheless, applying the Lanczos algorithm is often a significant step forward in computing the eigendecomposition.
If <math>\lambda</math> is an eigenvalue of <math>T</math>, and <math>x</math> its eigenvector (<math>Tx=\lambda x</math>), then <math> y = V x </math> is a corresponding eigenvector of <math>A</math> with the same eigenvalue:
<math display=block>\begin{align} A y &= A V x \\ &= V T V^* V x \\ &= V T I x \\ &= V T x \\ &= V (\lambda x) \\ &= \lambda V x \\& = \lambda y.\end{align}</math>
Thus the Lanczos algorithm transforms the eigendecomposition problem for <math>A</math> into the eigendecomposition problem for <math>T</math>.
# For tridiagonal matrices, there exist a number of specialised algorithms, often with better computational complexity than general-purpose algorithms. For example, if <math>T</math> is an <math>m \times m</math> tridiagonal symmetric matrix then:
# For tridiagonal matrices, there exist a number of specialised algorithms, often with better computational complexity than general-purpose algorithms. For example, if <math>T</math> is an <math>m \times m</math> tridiagonal symmetric matrix then:
#* The [[Tridiagonal matrix#Determinant|continuant recursion]] allows computing the [[characteristic polynomial]] in <math>O(m^2)</math> operations, and evaluating it at a point in <math>O(m)</math> operations.
#* The [[Tridiagonal matrix#Determinant|continuant recursion]] allows computing the [[characteristic polynomial]] in <math>O(m^2)</math> operations, and evaluating it at a point in <math>O(m)</math> operations.
#* The [[divide-and-conquer eigenvalue algorithm]] can be used to compute the entire eigendecomposition of <math>T</math> in <math>O(m^2)</math> operations.
#* The [[divide-and-conquer eigenvalue algorithm]] can be used to compute the entire eigendecomposition of <math>T</math> in <math>O(m^2)</math> operations.
#* The Fast Multipole Method<ref>{{cite journal|last1=Coakley|first1=Ed S.|last2=Rokhlin|first2=Vladimir|title=A fast divide-and-conquer algorithm for computing the spectra of real symmetric tridiagonal matrices|journal=Applied and Computational Harmonic Analysis|date=2013|volume=34|issue=3|pages=379–414|doi=10.1016/j.acha.2012.06.003|doi-access=free}}</ref> can compute all eigenvalues in just <math>O(m \log m)</math> operations.
#* The Fast Multipole Method<ref>{{cite journal|last1=Coakley|first1=Ed S.|last2=Rokhlin|first2=Vladimir|title=A fast divide-and-conquer algorithm for computing the spectra of real symmetric tridiagonal matrices|journal=Applied and Computational Harmonic Analysis|date=2013|volume=34|issue=3|pages=379–414|doi=10.1016/j.acha.2012.06.003|doi-access=}}</ref> can compute all eigenvalues in just <math>O(m \log m)</math> operations.
# Some general eigendecomposition algorithms, notably the [[QR algorithm]], are known to converge faster for tridiagonal matrices than for general matrices. Asymptotic complexity of tridiagonal QR is <math>O(m^2)</math> just as for the divide-and-conquer algorithm (though the constant factor may be different); since the eigenvectors together have <math>m^2</math> elements, this is asymptotically optimal.
# Some general eigendecomposition algorithms, notably the [[QR algorithm]], are known to converge faster for tridiagonal matrices than for general matrices. Asymptotic complexity of tridiagonal QR is <math>O(m^2)</math> just as for the divide-and-conquer algorithm (though the constant factor may be different); since the eigenvectors together have <math>m^2</math> elements, this is asymptotically optimal.
# Even algorithms whose convergence rates are unaffected by unitary transformations, such as the [[power method]] and [[inverse iteration]], may enjoy low-level performance benefits from being applied to the tridiagonal matrix <math>T</math> rather than the original matrix <math>A</math>. Since <math>T</math> is very sparse with all nonzero elements in highly predictable positions, it permits compact storage with excellent performance vis-à-vis [[Cache (computing)|caching]]. Likewise, <math>T</math> is a [[real number|real]] matrix with all eigenvectors and eigenvalues real, whereas <math>A</math> in general may have complex elements and eigenvectors, so real arithmetic is sufficient for finding the eigenvectors and eigenvalues of <math>T</math>.
# Even algorithms whose convergence rates are unaffected by unitary transformations, such as the [[power method]] and [[inverse iteration]], may enjoy low-level performance benefits from being applied to the tridiagonal matrix <math>T</math> rather than the original matrix <math>A</math>. Since <math>T</math> is very sparse with all nonzero elements in highly predictable positions, it permits compact storage with excellent performance vis-à-vis [[Cache (computing)|caching]]. Likewise, <math>T</math> is a [[real number|real]] matrix with all eigenvectors and eigenvalues real, whereas <math>A</math> in general may have complex elements and eigenvectors, so real arithmetic is sufficient for finding the eigenvectors and eigenvalues of <math>T</math>.
Line 76: Line 83:
:## Let <math> u_{j+1} = u_{j+1}' / \| u_{j+1}' \|.</math>
:## Let <math> u_{j+1} = u_{j+1}' / \| u_{j+1}' \|.</math>
:* In the large <math>j</math> limit, <math>u_j</math> approaches the normed eigenvector corresponding to the largest magnitude eigenvalue.
:* In the large <math>j</math> limit, <math>u_j</math> approaches the normed eigenvector corresponding to the largest magnitude eigenvalue.
A critique that can be raised against this method is that it is wasteful: it spends a lot of work (the matrix–vector products in step 2.1) extracting information from the matrix <math>A</math>, but pays attention only to the very last result; implementations typically use the same variable for all the vectors <math>u_j</math>, having each new iteration overwrite the results from the previous one. What if we instead kept all the intermediate results and organised their data?
A critique that can be raised against this method is that it is wasteful: it spends a lot of work (the matrix–vector products in step 2.1) extracting information from the matrix <math>A</math>, but pays attention only to the very last result; implementations typically use the same variable for all the vectors <math>u_j</math>, having each new iteration overwrite the results from the previous one. It may be desirable to instead keep all the intermediate results and organise the data.


One piece of information that trivially is available from the vectors <math>u_j</math> is a chain of [[Krylov subspace]]s. One way of stating that without introducing sets into the algorithm is to claim that it computes
One piece of information that trivially is available from the vectors <math>u_j</math> is a chain of [[Krylov subspace]]s. One way of stating that without introducing sets into the algorithm is to claim that it computes
Line 83: Line 90:


this is trivially satisfied by <math>v_j = u_j</math> as long as <math>u_j</math> is linearly independent of <math>u_1,\dotsc,u_{j-1}</math> (and in the case that there is such a dependence then one may continue the sequence by picking as <math>v_j</math> an arbitrary vector linearly independent of <math>u_1,\dotsc,u_{j-1}</math>). A basis containing the <math>u_j</math> vectors is however likely to be numerically [[ill-conditioned]], since this sequence of vectors is by design meant to converge to an eigenvector of <math>A</math>. To avoid that, one can combine the power iteration with a [[Gram–Schmidt process]], to instead produce an orthonormal basis of these Krylov subspaces.
this is trivially satisfied by <math>v_j = u_j</math> as long as <math>u_j</math> is linearly independent of <math>u_1,\dotsc,u_{j-1}</math> (and in the case that there is such a dependence then one may continue the sequence by picking as <math>v_j</math> an arbitrary vector linearly independent of <math>u_1,\dotsc,u_{j-1}</math>). A basis containing the <math>u_j</math> vectors is however likely to be numerically [[ill-conditioned]], since this sequence of vectors is by design meant to converge to an eigenvector of <math>A</math>. To avoid that, one can combine the power iteration with a [[Gram–Schmidt process]], to instead produce an orthonormal basis of these Krylov subspaces.
:# Pick a random vector <math>u_1</math> of Euclidean norm <math>1</math>. Let <math>v_1 = u_1</math>.
# Pick a random vector <math>u_1</math> of Euclidean norm <math>1</math>. Let <math>v_1 = u_1</math>.
:# For <math>j = 1,\dotsc,m-1</math> do:
# For <math>j = 1,\dotsc,m-1</math> do:
:## Let <math> u_{j+1}' = A u_j </math>.
## Let <math> u_{j+1}' = A u_j </math>.
:## For all <math> k = 1, \dotsc, j </math> let <math>g_{k,j} = v_k^* u_{j+1}'</math>. (These are the coordinates of <math> A u_j = u_{j+1}' </math> with respect to the basis vectors <math>v_1,\dotsc,v_j</math>.)
## For all <math> k = 1, \dotsc, j </math> let <math>g_{k,j} = v_k^* u_{j+1}'</math>. (These are the coordinates of <math> A u_j = u_{j+1}' </math> with respect to the basis vectors <math>v_1,\dotsc,v_j</math>.)
:## Let <math> w_{j+1} = u_{j+1}' - \sum_{k=1}^j g_{k,j} v_k</math>. (Cancel the component of <math>u_{j+1}'</math> that is in <math>\operatorname{span}(v_1,\dotsc,v_j)</math>.)
## Let <math> w_{j+1} = u_{j+1}' - \sum_{k=1}^j g_{k,j} v_k</math>. (Cancel the component of <math>u_{j+1}'</math> that is in <math>\operatorname{span}(v_1,\dotsc,v_j)</math>.)
:## If <math>w_{j+1} \neq 0</math> then let <math> u_{j+1} = u_{j+1}' / \| u_{j+1}' \| </math> and <math> v_{j+1} = w_{j+1} / \| w_{j+1} \| </math>,
## If <math>w_{j+1} \neq 0</math> then let <math> u_{j+1} = u_{j+1}' / \| u_{j+1}' \| </math> and <math> v_{j+1} = w_{j+1} / \| w_{j+1} \| </math>,
:##: otherwise pick as <math> u_{j+1} = v_{j+1} </math> an arbitrary vector of Euclidean norm <math>1</math> that is orthogonal to all of <math>v_1,\dotsc,v_j</math>.
##: otherwise pick as <math> u_{j+1} = v_{j+1} </math> an arbitrary vector of Euclidean norm <math>1</math> that is orthogonal to all of <math>v_1,\dotsc,v_j</math>.
The relation between the power iteration vectors <math>u_j</math> and the orthogonal vectors <math>v_j</math> is that
The relation between the power iteration vectors <math>u_j</math> and the orthogonal vectors <math>v_j</math> is that
: <math> A u_j = \| u_{j+1}'\| u_{j+1} = u_{j+1}' = w_{j+1} + \sum_{k=1}^j g_{k,j} v_k = \| w_{j+1}\| v_{j+1} + \sum_{k=1}^j g_{k,j} v_k </math>.
: <math> A u_j = \| u_{j+1}'\| u_{j+1} = u_{j+1}' = w_{j+1} + \sum_{k=1}^j g_{k,j} v_k = \| w_{j+1}\| v_{j+1} + \sum_{k=1}^j g_{k,j} v_k </math>.
Here it may be observed that we do not actually need the <math>u_j</math> vectors to compute these <math>v_j</math>, because <math>u_j - v_j \in \operatorname{span}(v_1,\dotsc,v_{j-1})</math> and therefore the difference between <math> u_{j+1}' = A u_j </math> and <math> w_{j+1}' = A v_j </math> is in <math>\operatorname{span}(v_1,\dotsc,v_j)</math>, which is cancelled out by the orthogonalisation process. Thus the same basis for the chain of Krylov subspaces is computed by
Here it may be observed that we do not actually need the <math>u_j</math> vectors to compute these <math>v_j</math>, because <math>u_j - v_j \in \operatorname{span}(v_1,\dotsc,v_{j-1})</math> and therefore the difference between <math> u_{j+1}' = A u_j </math> and <math> w_{j+1}' = A v_j </math> is in <math>\operatorname{span}(v_1,\dotsc,v_j)</math>, which is cancelled out by the orthogonalisation process. Thus the same basis for the chain of Krylov subspaces is computed by
:# Pick a random vector <math>v_1</math> of Euclidean norm <math>1</math>.
# Pick a random vector <math>v_1</math> of Euclidean norm <math>1</math>.
:# For <math>j = 1,\dotsc,m-1</math> do:
# For <math>j = 1,\dotsc,m-1</math> do:
:## Let <math> w_{j+1}' = A v_j </math>.
## Let <math> w_{j+1}' = A v_j </math>.
:## For all <math> k = 1, \dotsc, j </math> let <math>h_{k,j} = v_k^* w_{j+1}'</math>.
## For all <math> k = 1, \dotsc, j </math> let <math>h_{k,j} = v_k^* w_{j+1}'</math>.
:## Let <math> w_{j+1} = w_{j+1}' - \sum_{k=1}^j h_{k,j} v_k </math>.
## Let <math> w_{j+1} = w_{j+1}' - \sum_{k=1}^j h_{k,j} v_k </math>.
:## Let <math> h_{j+1,j} = \| w_{j+1} \| </math>.
## Let <math> h_{j+1,j} = \| w_{j+1} \| </math>.
:## If <math>h_{j+1,j} \neq 0</math> then let <math> v_{j+1} = w_{j+1} / h_{j+1,j} </math>,
## If <math>h_{j+1,j} \neq 0</math> then let <math> v_{j+1} = w_{j+1} / h_{j+1,j} </math>,
:##: otherwise pick as <math> v_{j+1} </math> an arbitrary vector of Euclidean norm <math>1</math> that is orthogonal to all of <math>v_1,\dotsc,v_j</math>.
##: otherwise pick as <math> v_{j+1} </math> an arbitrary vector of Euclidean norm <math>1</math> that is orthogonal to all of <math>v_1,\dotsc,v_j</math>.
A priori the coefficients <math>h_{k,j}</math> satisfy
A priori the coefficients <math>h_{k,j}</math> satisfy
: <math> A v_j = \sum_{k=1}^{j+1} h_{k,j} v_k </math> for all <math> j < m </math>;
: <math> A v_j = \sum_{k=1}^{j+1} h_{k,j} v_k </math> for all <math> j < m </math>;
Line 138: Line 145:
In particular, the largest eigenvalue <math>\lambda_\max</math> is the global maximum of <math>r</math> and the smallest eigenvalue <math>\lambda_\min</math> is the global minimum of <math>r</math>.
In particular, the largest eigenvalue <math>\lambda_\max</math> is the global maximum of <math>r</math> and the smallest eigenvalue <math>\lambda_\min</math> is the global minimum of <math>r</math>.


Within a low-dimensional subspace <math>\mathcal{L}</math> of <math>\Complex^n</math> it can be feasible to locate the maximum <math>x</math> and minimum <math>y</math> of <math>r</math>. Repeating that for an increasing chain <math>\mathcal{L}_1 \subset \mathcal{L}_2 \subset \cdots</math> produces two sequences of vectors: <math>x_1, x_2, \ldots</math> and <math>y_1, y_2, \dotsc </math> such that <math>x_j, y_j \in \mathcal{L}_j </math> and
Within a low-dimensional subspace <math>\mathcal{L}</math> of <math>\Complex^n</math> it can be feasible to locate the maximum <math>x</math> and minimum <math>y</math> of <math>r</math>. Repeating that for an increasing chain <math>\mathcal{L}_1 \subset \mathcal{L}_2 \subset \cdots</math> produces two sequences of vectors: <math>x_1, x_2, \ldots</math> and <math>y_1, y_2, \dotsc </math> such that <math>x_j, y_j \in \mathcal{L}_j </math> and


:<math>\begin{align}
:<math>\begin{align}
Line 151: Line 158:
: <math>\nabla r(x) = \frac{2}{x^* x} ( A x - r(x) x ), </math>
: <math>\nabla r(x) = \frac{2}{x^* x} ( A x - r(x) x ), </math>


so the directions of interest are easy enough to compute in matrix arithmetic, but if one wishes to improve on both <math>x_j</math> and <math>y_j</math> then there are two new directions to take into account: <math>Ax_j</math> and <math>Ay_j;</math> since <math>x_j</math> and <math>y_j</math> can be linearly independent vectors (indeed, are close to orthogonal), one cannot in general expect <math>Ax_j</math> and <math>Ay_j</math> to be parallel. Is it therefore necessary to increase the dimension of <math>\mathcal{L}_j</math> by <math>2</math> on every step? Not if <math>\{\mathcal{L}_j\}_{j=1}^m</math> are taken to be Krylov subspaces, because then <math>Az \in \mathcal{L}_{j+1}</math> for all <math>z \in \mathcal{L}_j,</math> thus in particular for both <math>z = x_j</math> and <math>z = y_j</math>.
so the directions of interest are easy enough to compute in matrix arithmetic, but if one wishes to improve on both <math>x_j</math> and <math>y_j</math> then there are two new directions to take into account: <math>Ax_j</math> and <math>Ay_j;</math> since <math>x_j</math> and <math>y_j</math> can be linearly independent vectors (indeed, are close to orthogonal), one cannot in general expect <math>Ax_j</math> and <math>Ay_j</math> to be parallel. It is not necessary to increase the dimension of <math>\mathcal{L}_j</math> by <math>2</math> on every step if <math>\{\mathcal{L}_j\}_{j=1}^m</math> are taken to be Krylov subspaces, because then <math>Az \in \mathcal{L}_{j+1}</math> for all <math>z \in \mathcal{L}_j,</math> thus in particular for both <math>z = x_j</math> and <math>z = y_j</math>.


In other words, we can start with some arbitrary initial vector <math>x_1 = y_1,</math> construct the vector spaces
In other words, we can start with some arbitrary initial vector <math>x_1 = y_1,</math> construct the vector spaces
Line 166: Line 173:
When analysing the dynamics of the algorithm, it is convenient to take the eigenvalues and eigenvectors of <math>A</math> as given, even though they are not explicitly known to the user. To fix notation, let <math>\lambda_1 \geqslant \lambda_2 \geqslant \dotsb \geqslant \lambda_n</math> be the eigenvalues (these are known to all be real, and thus possible to order) and let <math>z_1,\dotsc,z_n</math> be an orthonormal set of eigenvectors such that <math>A z_k = \lambda_k z_k</math> for all <math>k=1,\dotsc,n</math>.
When analysing the dynamics of the algorithm, it is convenient to take the eigenvalues and eigenvectors of <math>A</math> as given, even though they are not explicitly known to the user. To fix notation, let <math>\lambda_1 \geqslant \lambda_2 \geqslant \dotsb \geqslant \lambda_n</math> be the eigenvalues (these are known to all be real, and thus possible to order) and let <math>z_1,\dotsc,z_n</math> be an orthonormal set of eigenvectors such that <math>A z_k = \lambda_k z_k</math> for all <math>k=1,\dotsc,n</math>.


It is also convenient to fix a notation for the coefficients of the initial Lanczos vector <math>v_1</math> with respect to this eigenbasis; let <math>d_k = z_k^* v_1</math> for all <math>k=1,\dotsc,n</math>, so that <math> \textstyle v_1 = \sum_{k=1}^n d_k z_k</math>. A starting vector <math>v_1</math> depleted of some eigenvalue will delay convergence to the corresponding eigenvalue, and even though this just comes out as a constant factor in the error bounds, depletion remains undesirable. One common technique for avoiding being consistently hit by it is to pick <math>v_1</math> by first drawing the elements randomly according to the same [[normal distribution]] with mean <math>0</math> and then rescale the vector to norm <math>1</math>. Prior to the rescaling, this causes the coefficients <math>d_k</math> to also be independent normally distributed stochastic variables from the same normal distribution (since the change of coordinates is unitary), and after rescaling the vector <math>(d_1,\dotsc,d_n)</math> will have a [[uniform distribution (continuous)|uniform distribution]] on the unit sphere in <math>\mathbb{C}^n</math>. This makes it possible to bound the probability that for example <math>|d_1| < \varepsilon</math>.
It is also convenient to fix a notation for the coefficients of the initial Lanczos vector <math>v_1</math> with respect to this eigenbasis; let <math>d_k = z_k^* v_1</math> for all <math>k=1,\dotsc,n</math>, so that <math> \textstyle v_1 = \sum_{k=1}^n d_k z_k</math>. A starting vector <math>v_1</math> depleted of some eigencomponent will delay convergence to the corresponding eigenvalue, and even though this just comes out as a constant factor in the error bounds, depletion remains undesirable. One common technique for avoiding being consistently hit by it is to pick <math>v_1</math> by first drawing the elements randomly according to the same [[normal distribution]] with mean <math>0</math> and then rescale the vector to norm <math>1</math>. Prior to the rescaling, this causes the coefficients <math>d_k</math> to also be independent normally distributed stochastic variables from the same normal distribution (since the change of coordinates is unitary), and after rescaling the vector <math>(d_1,\dotsc,d_n)</math> will have a [[uniform distribution (continuous)|uniform distribution]] on the unit sphere in <math>\mathbb{C}^n</math>. This makes it possible to bound the probability that for example <math>|d_1| < \varepsilon</math>.


The fact that the Lanczos algorithm is coordinate-agnostic – operations only look at inner products of vectors, never at individual elements of vectors – makes it easy to construct examples with known eigenstructure to run the algorithm on: make <math>A</math> a diagonal matrix with the desired eigenvalues on the diagonal; as long as the starting vector <math>v_1</math> has enough nonzero elements, the algorithm will output a general tridiagonal symmetric matrix as <math>T</math>.
The fact that the Lanczos algorithm is coordinate-agnostic – operations only look at inner products of vectors, never at individual elements of vectors – makes it easy to construct examples with known eigenstructure to run the algorithm on: make <math>A</math> a diagonal matrix with the desired eigenvalues on the diagonal; as long as the starting vector <math>v_1</math> has enough nonzero elements, the algorithm will output a general tridiagonal symmetric matrix as <math>T</math>.
Line 276: Line 283:
Lanczos algorithms are very attractive because the multiplication by <math>A\,</math> is the only large-scale linear operation. Since weighted-term text retrieval engines implement just this operation, the Lanczos algorithm can be applied efficiently to text documents (see [[latent semantic indexing]]). Eigenvectors are also important for large-scale ranking methods such as the [[HITS algorithm]] developed by [[Jon Kleinberg]], or the [[PageRank]] algorithm used by Google.
Lanczos algorithms are very attractive because the multiplication by <math>A\,</math> is the only large-scale linear operation. Since weighted-term text retrieval engines implement just this operation, the Lanczos algorithm can be applied efficiently to text documents (see [[latent semantic indexing]]). Eigenvectors are also important for large-scale ranking methods such as the [[HITS algorithm]] developed by [[Jon Kleinberg]], or the [[PageRank]] algorithm used by Google.


Lanczos algorithms are also used in [[condensed matter physics]] as a method for solving [[Hamiltonian matrix|Hamiltonians]] of [[Strongly correlated material|strongly correlated electron systems]],<ref>{{cite journal|last=Chen|first=HY|author2=Atkinson, W.A. |author3=Wortis, R. |title=Disorder-induced zero-bias anomaly in the Anderson-Hubbard model: Numerical and analytical calculations|journal=Physical Review B|date=July 2011|volume=84|issue=4|pages=045113|doi=10.1103/PhysRevB.84.045113|arxiv=1012.1031|bibcode=2011PhRvB..84d5113C}}</ref> as well as in [[nuclear shell model|shell model]] codes in [[nuclear physics]].<ref>{{cite arxiv|last=Shimizu|first=Noritaka|title=Nuclear shell-model code for massive parallel computation, "KSHELL"|eprint=1310.5431|date=21 October 2013|class=nucl-th}}</ref>
Lanczos algorithms are also used in [[condensed matter physics]] as a method for solving [[Hamiltonian matrix|Hamiltonians]] of [[Strongly correlated material|strongly correlated electron systems]],<ref>{{cite journal|last=Chen|first=HY|author2=Atkinson, W.A. |author3=Wortis, R. |title=Disorder-induced zero-bias anomaly in the Anderson-Hubbard model: Numerical and analytical calculations|journal=Physical Review B|date=July 2011|volume=84|issue=4|pages=045113|doi=10.1103/PhysRevB.84.045113|arxiv=1012.1031|bibcode=2011PhRvB..84d5113C|s2cid=118722138}}</ref> as well as in [[nuclear shell model|shell model]] codes in [[nuclear physics]].<ref>{{cite arXiv|last=Shimizu|first=Noritaka|title=Nuclear shell-model code for massive parallel computation, "KSHELL"|eprint=1310.5431|date=21 October 2013|class=nucl-th}}</ref>


==Implementations==
==Implementations==
The [[NAG Numerical Library|NAG Library]] contains several routines<ref>{{ cite web | author = The Numerical Algorithms Group | title = Keyword Index: Lanczos | work = NAG Library Manual, Mark 23 | url = http://www.nag.co.uk/numeric/fl/nagdoc_fl23/html/INDEXES/KWIC/lanczos.html | access-date = 2012-02-09 }}</ref> for the solution of large scale linear systems and eigenproblems which use the Lanczos algorithm.
The [[NAG Numerical Library|NAG Library]] contains several routines<ref>{{ cite web | author = The Numerical Algorithms Group | title = Keyword Index: Lanczos | work = NAG Library Manual, Mark 23 | url = http://www.nag.co.uk/numeric/fl/nagdoc_fl23/html/INDEXES/KWIC/lanczos.html | access-date = 2012-02-09 }}</ref> for the solution of large scale linear systems and eigenproblems which use the Lanczos algorithm.


[[MATLAB]] and [[GNU Octave]] come with ARPACK built-in. Both stored and implicit matrices can be analyzed through the ''eigs()'' function ([http://www.mathworks.com/help/techdoc/ref/eigs.html Matlab]/[https://www.gnu.org/software/octave/doc/interpreter/Sparse-Linear-Algebra.html#doc_002deigs Octave]).
[[MATLAB]] and [[GNU Octave]] come with [[ARPACK]] built-in. Both stored and implicit matrices can be analyzed through the ''eigs()'' function ([http://www.mathworks.com/help/techdoc/ref/eigs.html Matlab]/[https://www.gnu.org/software/octave/doc/interpreter/Sparse-Linear-Algebra.html#doc_002deigs Octave]).

Similarly, in [[Python_(programming_language)|Python]], the [[SciPy]] package has [https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.eigsh.html scipy.sparse.linalg.eigsh] which is also a wrapper for the SSEUPD and DSEUPD functions functions from [[ARPACK]] which use the Implicitly Restarted Lanczos Method.


A Matlab implementation of the Lanczos algorithm (note precision issues) is available as a part of the [https://www.cs.cmu.edu/~bickson/gabp/#download Gaussian Belief Propagation Matlab Package]. The [[GraphLab]]<ref>[http://www.graphlab.ml.cmu.edu/pmf.html GraphLab] {{webarchive|url=https://web.archive.org/web/20110314171151/http://www.graphlab.ml.cmu.edu/pmf.html |date=2011-03-14 }}</ref> collaborative filtering library incorporates a large scale parallel implementation of the Lanczos algorithm (in C++) for multicore.
A Matlab implementation of the Lanczos algorithm (note precision issues) is available as a part of the [https://www.cs.cmu.edu/~bickson/gabp/#download Gaussian Belief Propagation Matlab Package]. The [[GraphLab]]<ref>[http://www.graphlab.ml.cmu.edu/pmf.html GraphLab] {{webarchive|url=https://web.archive.org/web/20110314171151/http://www.graphlab.ml.cmu.edu/pmf.html |date=2011-03-14 }}</ref> [[collaborative filtering]] library incorporates a large scale parallel implementation of the Lanczos algorithm (in C++) for multicore.


The [http://www.cs.wm.edu/~andreas/software/ PRIMME] library also implements a Lanczos-like algorithm.
The [http://www.cs.wm.edu/~andreas/software/ PRIMME] library also implements a Lanczos-like algorithm.
Line 295: Line 304:
==Further reading==
==Further reading==
* {{cite book |first1=Gene H. |last1=Golub |author-link=Gene H. Golub |first2=Charles F. |last2=Van Loan |author-link2=Charles F. Van Loan |title=Matrix Computations |location=Baltimore |publisher=Johns Hopkins University Press |year=1996 |isbn=0-8018-5414-8 |pages=470–507 |chapter=Lanczos Methods |chapter-url=https://books.google.com/books?id=mlOa7wPX6OYC&pg=PA470 }}
* {{cite book |first1=Gene H. |last1=Golub |author-link=Gene H. Golub |first2=Charles F. |last2=Van Loan |author-link2=Charles F. Van Loan |title=Matrix Computations |location=Baltimore |publisher=Johns Hopkins University Press |year=1996 |isbn=0-8018-5414-8 |pages=470–507 |chapter=Lanczos Methods |chapter-url=https://books.google.com/books?id=mlOa7wPX6OYC&pg=PA470 }}
* {{cite paper |first1=Andrew Y. |last1=Ng |author-link=Andrew Ng |first2=Alice X. |last2=Zheng |first3=Michael I. |last3=Jordan |author-link3=Michael I. Jordan |title=Link Analysis, Eigenvectors and Stability |journal=IJCAI'01 Proceedings of the 17th International Joint Conference on Artificial Intelligence |volume=2 |pages=903–910 |date=2001 |url=https://ai.stanford.edu/~ang/papers/ijcai01-linkanalysis.pdf }}
* {{cite journal |first1=Andrew Y. |last1=Ng |author-link=Andrew Ng |first2=Alice X. |last2=Zheng |first3=Michael I. |last3=Jordan |author-link3=Michael I. Jordan |title=Link Analysis, Eigenvectors and Stability |journal=IJCAI'01 Proceedings of the 17th International Joint Conference on Artificial Intelligence |volume=2 |pages=903–910 |date=2001 |url=https://ai.stanford.edu/~ang/papers/ijcai01-linkanalysis.pdf }}
* {{cite book|author=Erik Koch|chapter-url=http://www.cond-mat.de/events/correl19/manuscripts/koch.pdf|chapter=Exact Diagonalization and Lanczos Method|editor1= E. Pavarini |editor2=E. Koch |editor3=S. Zhang |title= Many-Body Methods for Real Materials|publisher= Jülich |year=2019|isbn=978-3-95806-400-3}}


{{Numerical linear algebra}}
{{Numerical linear algebra}}

Latest revision as of 09:57, 15 May 2024

The Lanczos algorithm is an iterative method devised by Cornelius Lanczos that is an adaptation of power methods to find the "most useful" (tending towards extreme highest/lowest) eigenvalues and eigenvectors of an Hermitian matrix, where is often but not necessarily much smaller than .[1] Although computationally efficient in principle, the method as initially formulated was not useful, due to its numerical instability.

In 1970, Ojalvo and Newman showed how to make the method numerically stable and applied it to the solution of very large engineering structures subjected to dynamic loading.[2] This was achieved using a method for purifying the Lanczos vectors (i.e. by repeatedly reorthogonalizing each newly generated vector with all previously generated ones)[2] to any degree of accuracy, which when not performed, produced a series of vectors that were highly contaminated by those associated with the lowest natural frequencies.

In their original work, these authors also suggested how to select a starting vector (i.e. use a random-number generator to select each element of the starting vector) and suggested an empirically determined method for determining , the reduced number of vectors (i.e. it should be selected to be approximately 1.5 times the number of accurate eigenvalues desired). Soon thereafter their work was followed by Paige, who also provided an error analysis.[3][4] In 1988, Ojalvo produced a more detailed history of this algorithm and an efficient eigenvalue error test.[5]

The algorithm

[edit]
Input a Hermitian matrix of size , and optionally a number of iterations (as default, let ).
  • Strictly speaking, the algorithm does not need access to the explicit matrix, but only a function that computes the product of the matrix by an arbitrary vector. This function is called at most times.
Output an matrix with orthonormal columns and a tridiagonal real symmetric matrix of size . If , then is unitary, and .
Warning The Lanczos iteration is prone to numerical instability. When executed in non-exact arithmetic, additional measures (as outlined in later sections) should be taken to ensure validity of the results.
  1. Let be an arbitrary vector with Euclidean norm .
  2. Abbreviated initial iteration step:
    1. Let .
    2. Let .
    3. Let .
  3. For do:
    1. Let (also Euclidean norm).
    2. If , then let ,
      else pick as an arbitrary vector with Euclidean norm that is orthogonal to all of .
    3. Let .
    4. Let .
    5. Let .
  4. Let be the matrix with columns . Let .
Note for .

There are in principle four ways to write the iteration procedure. Paige and other works show that the above order of operations is the most numerically stable.[6][7] In practice the initial vector may be taken as another argument of the procedure, with and indicators of numerical imprecision being included as additional loop termination conditions.

Not counting the matrix–vector multiplication, each iteration does arithmetical operations. The matrix–vector multiplication can be done in arithmetical operations where is the average number of nonzero elements in a row. The total complexity is thus , or if ; the Lanczos algorithm can be very fast for sparse matrices. Schemes for improving numerical stability are typically judged against this high performance.

The vectors are called Lanczos vectors. The vector is not used after is computed, and the vector is not used after is computed. Hence one may use the same storage for all three. Likewise, if only the tridiagonal matrix is sought, then the raw iteration does not need after having computed , although some schemes for improving the numerical stability would need it later on. Sometimes the subsequent Lanczos vectors are recomputed from when needed.

Application to the eigenproblem

[edit]

The Lanczos algorithm is most often brought up in the context of finding the eigenvalues and eigenvectors of a matrix, but whereas an ordinary diagonalization of a matrix would make eigenvectors and eigenvalues apparent from inspection, the same is not true for the tridiagonalization performed by the Lanczos algorithm; nontrivial additional steps are needed to compute even a single eigenvalue or eigenvector. Nonetheless, applying the Lanczos algorithm is often a significant step forward in computing the eigendecomposition.

If is an eigenvalue of , and its eigenvector (), then is a corresponding eigenvector of with the same eigenvalue:

Thus the Lanczos algorithm transforms the eigendecomposition problem for into the eigendecomposition problem for .

  1. For tridiagonal matrices, there exist a number of specialised algorithms, often with better computational complexity than general-purpose algorithms. For example, if is an tridiagonal symmetric matrix then:
    • The continuant recursion allows computing the characteristic polynomial in operations, and evaluating it at a point in operations.
    • The divide-and-conquer eigenvalue algorithm can be used to compute the entire eigendecomposition of in operations.
    • The Fast Multipole Method[8] can compute all eigenvalues in just operations.
  2. Some general eigendecomposition algorithms, notably the QR algorithm, are known to converge faster for tridiagonal matrices than for general matrices. Asymptotic complexity of tridiagonal QR is just as for the divide-and-conquer algorithm (though the constant factor may be different); since the eigenvectors together have elements, this is asymptotically optimal.
  3. Even algorithms whose convergence rates are unaffected by unitary transformations, such as the power method and inverse iteration, may enjoy low-level performance benefits from being applied to the tridiagonal matrix rather than the original matrix . Since is very sparse with all nonzero elements in highly predictable positions, it permits compact storage with excellent performance vis-à-vis caching. Likewise, is a real matrix with all eigenvectors and eigenvalues real, whereas in general may have complex elements and eigenvectors, so real arithmetic is sufficient for finding the eigenvectors and eigenvalues of .
  4. If is very large, then reducing so that is of a manageable size will still allow finding the more extreme eigenvalues and eigenvectors of ; in the region, the Lanczos algorithm can be viewed as a lossy compression scheme for Hermitian matrices, that emphasises preserving the extreme eigenvalues.

The combination of good performance for sparse matrices and the ability to compute several (without computing all) eigenvalues are the main reasons for choosing to use the Lanczos algorithm.

Application to tridiagonalization

[edit]

Though the eigenproblem is often the motivation for applying the Lanczos algorithm, the operation the algorithm primarily performs is tridiagonalization of a matrix, for which numerically stable Householder transformations have been favoured since the 1950s. During the 1960s the Lanczos algorithm was disregarded. Interest in it was rejuvenated by the Kaniel–Paige convergence theory and the development of methods to prevent numerical instability, but the Lanczos algorithm remains the alternative algorithm that one tries only if Householder is not satisfactory.[9]

Aspects in which the two algorithms differ include:

  • Lanczos takes advantage of being a sparse matrix, whereas Householder does not, and will generate fill-in.
  • Lanczos works throughout with the original matrix (and has no problem with it being known only implicitly), whereas raw Householder wants to modify the matrix during the computation (although that can be avoided).
  • Each iteration of the Lanczos algorithm produces another column of the final transformation matrix , whereas an iteration of Householder produces another factor in a unitary factorisation of . Each factor is however determined by a single vector, so the storage requirements are the same for both algorithms, and can be computed in time.
  • Householder is numerically stable, whereas raw Lanczos is not.
  • Lanczos is highly parallel, with only points of synchronisation (the computations of and ). Householder is less parallel, having a sequence of scalar quantities computed that each depend on the previous quantity in the sequence.

Derivation of the algorithm

[edit]

There are several lines of reasoning which lead to the Lanczos algorithm.

A more provident power method

[edit]

The power method for finding the eigenvalue of largest magnitude and a corresponding eigenvector of a matrix is roughly

  1. Pick a random vector .
  2. For (until the direction of has converged) do:
    1. Let
    2. Let
  • In the large limit, approaches the normed eigenvector corresponding to the largest magnitude eigenvalue.

A critique that can be raised against this method is that it is wasteful: it spends a lot of work (the matrix–vector products in step 2.1) extracting information from the matrix , but pays attention only to the very last result; implementations typically use the same variable for all the vectors , having each new iteration overwrite the results from the previous one. It may be desirable to instead keep all the intermediate results and organise the data.

One piece of information that trivially is available from the vectors is a chain of Krylov subspaces. One way of stating that without introducing sets into the algorithm is to claim that it computes

a subset of a basis of such that for every and all

this is trivially satisfied by as long as is linearly independent of (and in the case that there is such a dependence then one may continue the sequence by picking as an arbitrary vector linearly independent of ). A basis containing the vectors is however likely to be numerically ill-conditioned, since this sequence of vectors is by design meant to converge to an eigenvector of . To avoid that, one can combine the power iteration with a Gram–Schmidt process, to instead produce an orthonormal basis of these Krylov subspaces.

  1. Pick a random vector of Euclidean norm . Let .
  2. For do:
    1. Let .
    2. For all let . (These are the coordinates of with respect to the basis vectors .)
    3. Let . (Cancel the component of that is in .)
    4. If then let and ,
      otherwise pick as an arbitrary vector of Euclidean norm that is orthogonal to all of .

The relation between the power iteration vectors and the orthogonal vectors is that

.

Here it may be observed that we do not actually need the vectors to compute these , because and therefore the difference between and is in , which is cancelled out by the orthogonalisation process. Thus the same basis for the chain of Krylov subspaces is computed by

  1. Pick a random vector of Euclidean norm .
  2. For do:
    1. Let .
    2. For all let .
    3. Let .
    4. Let .
    5. If then let ,
      otherwise pick as an arbitrary vector of Euclidean norm that is orthogonal to all of .

A priori the coefficients satisfy

for all ;

the definition may seem a bit odd, but fits the general pattern since

Because the power iteration vectors that were eliminated from this recursion satisfy the vectors and coefficients contain enough information from that all of can be computed, so nothing was lost by switching vectors. (Indeed, it turns out that the data collected here give significantly better approximations of the largest eigenvalue than one gets from an equal number of iterations in the power method, although that is not necessarily obvious at this point.)

This last procedure is the Arnoldi iteration. The Lanczos algorithm then arises as the simplification one gets from eliminating calculation steps that turn out to be trivial when is Hermitian—in particular most of the coefficients turn out to be zero.

Elementarily, if is Hermitian then

For we know that , and since by construction is orthogonal to this subspace, this inner product must be zero. (This is essentially also the reason why sequences of orthogonal polynomials can always be given a three-term recurrence relation.) For one gets

since the latter is real on account of being the norm of a vector. For one gets

meaning this is real too.

More abstractly, if is the matrix with columns then the numbers can be identified as elements of the matrix , and for the matrix is upper Hessenberg. Since

the matrix is Hermitian. This implies that is also lower Hessenberg, so it must in fact be tridiagional. Being Hermitian, its main diagonal is real, and since its first subdiagonal is real by construction, the same is true for its first superdiagonal. Therefore, is a real, symmetric matrix—the matrix of the Lanczos algorithm specification.

Simultaneous approximation of extreme eigenvalues

[edit]

One way of characterising the eigenvectors of a Hermitian matrix is as stationary points of the Rayleigh quotient

In particular, the largest eigenvalue is the global maximum of and the smallest eigenvalue is the global minimum of .

Within a low-dimensional subspace of it can be feasible to locate the maximum and minimum of . Repeating that for an increasing chain produces two sequences of vectors: and such that and

The question then arises how to choose the subspaces so that these sequences converge at optimal rate.

From , the optimal direction in which to seek larger values of is that of the gradient , and likewise from the optimal direction in which to seek smaller values of is that of the negative gradient . In general

so the directions of interest are easy enough to compute in matrix arithmetic, but if one wishes to improve on both and then there are two new directions to take into account: and since and can be linearly independent vectors (indeed, are close to orthogonal), one cannot in general expect and to be parallel. It is not necessary to increase the dimension of by on every step if are taken to be Krylov subspaces, because then for all thus in particular for both and .

In other words, we can start with some arbitrary initial vector construct the vector spaces

and then seek such that

Since the th power method iterate belongs to it follows that an iteration to produce the and cannot converge slower than that of the power method, and will achieve more by approximating both eigenvalue extremes. For the subproblem of optimising on some , it is convenient to have an orthonormal basis for this vector space. Thus we are again led to the problem of iteratively computing such a basis for the sequence of Krylov subspaces.

Convergence and other dynamics

[edit]

When analysing the dynamics of the algorithm, it is convenient to take the eigenvalues and eigenvectors of as given, even though they are not explicitly known to the user. To fix notation, let be the eigenvalues (these are known to all be real, and thus possible to order) and let be an orthonormal set of eigenvectors such that for all .

It is also convenient to fix a notation for the coefficients of the initial Lanczos vector with respect to this eigenbasis; let for all , so that . A starting vector depleted of some eigencomponent will delay convergence to the corresponding eigenvalue, and even though this just comes out as a constant factor in the error bounds, depletion remains undesirable. One common technique for avoiding being consistently hit by it is to pick by first drawing the elements randomly according to the same normal distribution with mean and then rescale the vector to norm . Prior to the rescaling, this causes the coefficients to also be independent normally distributed stochastic variables from the same normal distribution (since the change of coordinates is unitary), and after rescaling the vector will have a uniform distribution on the unit sphere in . This makes it possible to bound the probability that for example .

The fact that the Lanczos algorithm is coordinate-agnostic – operations only look at inner products of vectors, never at individual elements of vectors – makes it easy to construct examples with known eigenstructure to run the algorithm on: make a diagonal matrix with the desired eigenvalues on the diagonal; as long as the starting vector has enough nonzero elements, the algorithm will output a general tridiagonal symmetric matrix as .

Kaniel–Paige convergence theory

[edit]

After iteration steps of the Lanczos algorithm, is an real symmetric matrix, that similarly to the above has eigenvalues By convergence is primarily understood the convergence of to (and the symmetrical convergence of to ) as grows, and secondarily the convergence of some range of eigenvalues of to their counterparts of . The convergence for the Lanczos algorithm is often orders of magnitude faster than that for the power iteration algorithm.[9]: 477 

The bounds for come from the above interpretation of eigenvalues as extreme values of the Rayleigh quotient . Since is a priori the maximum of on the whole of whereas is merely the maximum on an -dimensional Krylov subspace, we trivially get . Conversely, any point in that Krylov subspace provides a lower bound for , so if a point can be exhibited for which is small then this provides a tight bound on .

The dimension Krylov subspace is

so any element of it can be expressed as for some polynomial of degree at most ; the coefficients of that polynomial are simply the coefficients in the linear combination of the vectors . The polynomial we want will turn out to have real coefficients, but for the moment we should allow also for complex coefficients, and we will write for the polynomial obtained by complex conjugating all coefficients of . In this parametrisation of the Krylov subspace, we have

Using now the expression for as a linear combination of eigenvectors, we get

and more generally

for any polynomial .

Thus

A key difference between numerator and denominator here is that the term vanishes in the numerator, but not in the denominator. Thus if one can pick to be large at but small at all other eigenvalues, one will get a tight bound on the error .

Since has many more eigenvalues than has coefficients, this may seem a tall order, but one way to meet it is to use Chebyshev polynomials. Writing for the degree Chebyshev polynomial of the first kind (that which satisfies for all ), we have a polynomial which stays in the range on the known interval but grows rapidly outside it. With some scaling of the argument, we can have it map all eigenvalues except into . Let

(in case , use instead the largest eigenvalue strictly less than ), then the maximal value of for is and the minimal value is , so

Furthermore

the quantity

(i.e., the ratio of the first eigengap to the diameter of the rest of the spectrum) is thus of key importance for the convergence rate here. Also writing

we may conclude that

The convergence rate is thus controlled chiefly by , since this bound shrinks by a factor for each extra iteration.

For comparison, one may consider how the convergence rate of the power method depends on , but since the power method primarily is sensitive to the quotient between absolute values of the eigenvalues, we need for the eigengap between and to be the dominant one. Under that constraint, the case that most favours the power method is that , so consider that. Late in the power method, the iteration vector:

[note 1]

where each new iteration effectively multiplies the -amplitude by

The estimate of the largest eigenvalue is then

so the above bound for the Lanczos algorithm convergence rate should be compared to

which shrinks by a factor of for each iteration. The difference thus boils down to that between and . In the region, the latter is more like , and performs like the power method would with an eigengap twice as large; a notable improvement. The more challenging case is however that of in which is an even larger improvement on the eigengap; the region is where the Lanczos algorithm convergence-wise makes the smallest improvement on the power method.

Numerical stability

[edit]

Stability means how much the algorithm will be affected (i.e. will it produce the approximate result close to the original one) if there are small numerical errors introduced and accumulated. Numerical stability is the central criterion for judging the usefulness of implementing an algorithm on a computer with roundoff.

For the Lanczos algorithm, it can be proved that with exact arithmetic, the set of vectors constructs an orthonormal basis, and the eigenvalues/vectors solved are good approximations to those of the original matrix. However, in practice (as the calculations are performed in floating point arithmetic where inaccuracy is inevitable), the orthogonality is quickly lost and in some cases the new vector could even be linearly dependent on the set that is already constructed. As a result, some of the eigenvalues of the resultant tridiagonal matrix may not be approximations to the original matrix. Therefore, the Lanczos algorithm is not very stable.

Users of this algorithm must be able to find and remove those "spurious" eigenvalues. Practical implementations of the Lanczos algorithm go in three directions to fight this stability issue:[6][7]

  1. Prevent the loss of orthogonality,
  2. Recover the orthogonality after the basis is generated.
  3. After the good and "spurious" eigenvalues are all identified, remove the spurious ones.

Variations

[edit]

Variations on the Lanczos algorithm exist where the vectors involved are tall, narrow matrices instead of vectors and the normalizing constants are small square matrices. These are called "block" Lanczos algorithms and can be much faster on computers with large numbers of registers and long memory-fetch times.

Many implementations of the Lanczos algorithm restart after a certain number of iterations. One of the most influential restarted variations is the implicitly restarted Lanczos method,[10] which is implemented in ARPACK.[11] This has led into a number of other restarted variations such as restarted Lanczos bidiagonalization.[12] Another successful restarted variation is the Thick-Restart Lanczos method,[13] which has been implemented in a software package called TRLan.[14]

Nullspace over a finite field

[edit]

In 1995, Peter Montgomery published an algorithm, based on the Lanczos algorithm, for finding elements of the nullspace of a large sparse matrix over GF(2); since the set of people interested in large sparse matrices over finite fields and the set of people interested in large eigenvalue problems scarcely overlap, this is often also called the block Lanczos algorithm without causing unreasonable confusion.[citation needed]

Applications

[edit]

Lanczos algorithms are very attractive because the multiplication by is the only large-scale linear operation. Since weighted-term text retrieval engines implement just this operation, the Lanczos algorithm can be applied efficiently to text documents (see latent semantic indexing). Eigenvectors are also important for large-scale ranking methods such as the HITS algorithm developed by Jon Kleinberg, or the PageRank algorithm used by Google.

Lanczos algorithms are also used in condensed matter physics as a method for solving Hamiltonians of strongly correlated electron systems,[15] as well as in shell model codes in nuclear physics.[16]

Implementations

[edit]

The NAG Library contains several routines[17] for the solution of large scale linear systems and eigenproblems which use the Lanczos algorithm.

MATLAB and GNU Octave come with ARPACK built-in. Both stored and implicit matrices can be analyzed through the eigs() function (Matlab/Octave).

Similarly, in Python, the SciPy package has scipy.sparse.linalg.eigsh which is also a wrapper for the SSEUPD and DSEUPD functions functions from ARPACK which use the Implicitly Restarted Lanczos Method.

A Matlab implementation of the Lanczos algorithm (note precision issues) is available as a part of the Gaussian Belief Propagation Matlab Package. The GraphLab[18] collaborative filtering library incorporates a large scale parallel implementation of the Lanczos algorithm (in C++) for multicore.

The PRIMME library also implements a Lanczos-like algorithm.

Notes

[edit]
  1. ^ The coefficients need not both be real, but the phase is of little importance. Nor need the composants for other eigenvectors have completely disappeared, but they shrink at least as fast as that for , so describes the worst case.

References

[edit]
  1. ^ Lanczos, C. (1950). "An iteration method for the solution of the eigenvalue problem of linear differential and integral operators" (PDF). Journal of Research of the National Bureau of Standards. 45 (4): 255–282. doi:10.6028/jres.045.026.
  2. ^ a b Ojalvo, I. U.; Newman, M. (1970). "Vibration modes of large structures by an automatic matrix-reduction method". AIAA Journal. 8 (7): 1234–1239. Bibcode:1970AIAAJ...8.1234N. doi:10.2514/3.5878.
  3. ^ Paige, C. C. (1971). The computation of eigenvalues and eigenvectors of very large sparse matrices (Ph.D. thesis). U. of London. OCLC 654214109.
  4. ^ Paige, C. C. (1972). "Computational Variants of the Lanczos Method for the Eigenproblem". J. Inst. Maths Applics. 10 (3): 373–381. doi:10.1093/imamat/10.3.373.
  5. ^ Ojalvo, I. U. (1988). "Origins and advantages of Lanczos vectors for large dynamic systems". Proc. 6th Modal Analysis Conference (IMAC), Kissimmee, FL. pp. 489–494.
  6. ^ a b Cullum; Willoughby (1985). Lanczos Algorithms for Large Symmetric Eigenvalue Computations. Vol. 1. ISBN 0-8176-3058-9.
  7. ^ a b Yousef Saad (1992-06-22). Numerical Methods for Large Eigenvalue Problems. ISBN 0-470-21820-7.
  8. ^ Coakley, Ed S.; Rokhlin, Vladimir (2013). "A fast divide-and-conquer algorithm for computing the spectra of real symmetric tridiagonal matrices". Applied and Computational Harmonic Analysis. 34 (3): 379–414. doi:10.1016/j.acha.2012.06.003.
  9. ^ a b Golub, Gene H.; Van Loan, Charles F. (1996). Matrix computations (3. ed.). Baltimore: Johns Hopkins Univ. Press. ISBN 0-8018-5413-X.
  10. ^ D. Calvetti; L. Reichel; D.C. Sorensen (1994). "An Implicitly Restarted Lanczos Method for Large Symmetric Eigenvalue Problems". Electronic Transactions on Numerical Analysis. 2: 1–21.
  11. ^ R. B. Lehoucq; D. C. Sorensen; C. Yang (1998). ARPACK Users Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM. doi:10.1137/1.9780898719628. ISBN 978-0-89871-407-4.
  12. ^ E. Kokiopoulou; C. Bekas; E. Gallopoulos (2004). "Computing smallest singular triplets with implicitly restarted Lanczos bidiagonalization" (PDF). Appl. Numer. Math. 49: 39–61. doi:10.1016/j.apnum.2003.11.011.
  13. ^ Kesheng Wu; Horst Simon (2000). "Thick-Restart Lanczos Method for Large Symmetric Eigenvalue Problems". SIAM Journal on Matrix Analysis and Applications. 22 (2). SIAM: 602–616. doi:10.1137/S0895479898334605.
  14. ^ Kesheng Wu; Horst Simon (2001). "TRLan software package". Archived from the original on 2007-07-01. Retrieved 2007-06-30.
  15. ^ Chen, HY; Atkinson, W.A.; Wortis, R. (July 2011). "Disorder-induced zero-bias anomaly in the Anderson-Hubbard model: Numerical and analytical calculations". Physical Review B. 84 (4): 045113. arXiv:1012.1031. Bibcode:2011PhRvB..84d5113C. doi:10.1103/PhysRevB.84.045113. S2CID 118722138.
  16. ^ Shimizu, Noritaka (21 October 2013). "Nuclear shell-model code for massive parallel computation, "KSHELL"". arXiv:1310.5431 [nucl-th].
  17. ^ The Numerical Algorithms Group. "Keyword Index: Lanczos". NAG Library Manual, Mark 23. Retrieved 2012-02-09.
  18. ^ GraphLab Archived 2011-03-14 at the Wayback Machine

Further reading

[edit]