Jump to content

LOBPCG: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
typo
Citation bot (talk | contribs)
Altered bibcode. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:Numerical linear algebra | #UCB_Category 21/123
 
(162 intermediate revisions by 56 users not shown)
Line 1: Line 1:
'''Locally Optimal Block Preconditioned Conjugate Gradient Method''' ('''LOBPCG''') is an algorithm, proposed in (Knyazev, 2001), for finding the largest (or smallest) [[eigenvalues]] and the corresponding [[eigenvectors]] of a symmetric positive definite [[generalized eigenvalue problem]].
'''Locally Optimal Block Preconditioned Conjugate Gradient''' ('''LOBPCG''') is a [[matrix-free methods|matrix-free method]] for finding the largest (or smallest) [[eigenvalues]] and the corresponding [[eigenvectors]] of a symmetric [[generalized eigenvalue problem]]


:<math>A x= \lambda B x,</math>
:<math>A x= \lambda B x,</math>
Line 6: Line 6:
the matrix <math>B</math> is also assumed [[positive-definite matrix|positive-definite]].
the matrix <math>B</math> is also assumed [[positive-definite matrix|positive-definite]].


==Background==
The method performs an [[iterative]] maximization (or minimization) of the generalized [[Rayleigh quotient]]
[[Kantorovich]] in 1948 proposed calculating the smallest [[eigenvalue]] <math>\lambda_1</math> of a symmetric matrix <math>A</math> by [[steepest descent]] using a direction <math>r = Ax-\lambda (x) x</math> of a scaled [[gradient]] of a [[Rayleigh quotient]] <math>\lambda(x) = (x, Ax)/(x, x)</math> in a [[scalar product]] <math>(x, y) = x'y</math>, with the step size computed by minimizing the Rayleigh quotient in the [[linear span]] of the vectors <math>x</math> and <math>w</math>, i.e. in a locally optimal manner. Samokish<ref name="S58">{{Cite journal| title = The steepest descent method for an eigenvalue problem with semi-bounded operators| journal = Izvestiya Vuzov, Math.| issue = 5| pages = 105–114| year = 1958| last1 = Samokish| first1 = B.A. }}</ref> proposed applying a [[preconditioner]] <math>T</math> to the residual vector <math>r</math> to generate the preconditioned direction <math>w = T r</math> and derived asymptotic, as <math>x</math> approaches the [[eigenvector]], convergence rate bounds. [[Yevgeny Dyakonov|D'yakonov]] suggested<ref name="D">{{cite book |title= Optimization in solving elliptic problems |last= D'yakonov |first= E. G. |year= 1996 |publisher= CRC-Press |isbn= 978-0-8493-2872-5 |pages= 592 |url= https://archive.org/details/optimizationinso0000diak |url-access= registration }}</ref> spectrally equivalent [[preconditioning]] and derived non-asymptotic convergence rate bounds. Block locally optimal multi-step steepest descent for eigenvalue problems was described in.<ref name="CW">{{Cite book| title = Lanczos algorithms for large symmetric eigenvalue computations. Vol. 1 (Reprint of the 1985 original)| year = 2002| last1 = Cullum| first1 = Jane K.|author1-link= Jane Cullum | last2 = Willoughby| first2 = Ralph A.| publisher = [[Society for Industrial and Applied Mathematics]]}}</ref> Local minimization of the Rayleigh quotient on the subspace spanned by the current approximation, the current residual and the previous approximation, as well as its block version, appeared in.<ref name="K87">{{Cite journal| title = Convergence rate estimates for iterative methods for mesh symmetric eigenvalue problem| journal = Soviet Journal of Numerical Analysis and Mathematical Modelling | volume = 2| number =5| pages = 371–396| year = 1987| last1 = Knyazev | first1 = Andrew V. |doi=10.1515/rnam.1987.2.5.371| s2cid = 121473545 }}</ref> The preconditioned version was analyzed in <ref name="K91">{{Cite book |doi=10.1007/978-3-0348-6332-2_11 |chapter=A Preconditioned Conjugate Gradient Method for Eigenvalue Problems and its Implementation in a Subspace |title=Numerical Treatment of Eigenvalue Problems Vol. 5 |series=International Series of Numerical Mathematics |year=1991 |last1=Knyazev |first1=A. V. |volume=96 |pages=143–154 |isbn=978-3-0348-6334-6 |editor-last1=Albrecht |editor-first1=J. |editor-last2=Collatz |editor-first2=L. |editor-last3=Hagedorn |editor-first3=P. |editor-last4=Velte |editor-first4=W. }}</ref> and.<ref name="K98">{{Cite journal| title = Preconditioned eigensolvers - an oxymoron?| journal = [[Electronic Transactions on Numerical Analysis]]| volume = 7| pages = 104–123| year = 1998| last1 = Knyazev | first1 = Andrew V. }}</ref>

==Main features==
Source:<ref name="K2017">{{cite arXiv| title = Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method (LOBPCG)| eprint = 1708.08354| year = 2017| last1 = Knyazev | first1 = Andrew| class = cs.NA}}</ref>

* [[Matrix-free methods|Matrix-free]], i.e. does not require storing the coefficient matrix explicitly, but can access the matrix by evaluating matrix-vector products.
* [[Factorization]]-free, i.e. does not require any [[matrix decomposition]] even for a [[generalized eigenvalue problem]].
* The costs per iteration and the memory use are competitive with those of the [[Lanczos method]], computing a single extreme eigenpair of a symmetric matrix.
* Linear convergence is theoretically guaranteed and practically observed.
* Accelerated convergence due to direct [[preconditioning]], in contrast to the [[Lanczos method]], including variable and non-symmetric as well as fixed and positive definite [[preconditioning]].
* Allows trivial incorporation of efficient [[domain decomposition]] and [[multigrid]] techniques via preconditioning.
* Warm starts and computes an approximation to the eigenvector on every iteration.
* More numerically stable compared to the [[Lanczos method]], and can operate in low-precision computer arithmetic.
* Easy to implement, with many versions already appeared.
* Blocking allows utilizing highly efficient matrix-matrix operations, e.g., [[BLAS]] 3.
* The block size can be tuned to balance convergence speed vs. computer costs of orthogonalizations and the [[Rayleigh-Ritz method]] on every iteration.

==Algorithm==
===Single-vector version===
====Preliminaries: [[Gradient descent]] for eigenvalue problems====
The method performs an [[iterative]] maximization (or minimization) of the generalized [[Rayleigh quotient]]


:<math>\rho(x) := \rho(A,B; x) :=\frac{x^T A x}{x^T B x},</math>
:<math>\rho(x) := \rho(A,B; x) :=\frac{x^T A x}{x^T B x},</math>
Line 12: Line 33:
which results in finding largest (or smallest) eigenpairs of <math>A x= \lambda B x.</math>
which results in finding largest (or smallest) eigenpairs of <math>A x= \lambda B x.</math>


The direction of the steepest accent, which is the [[gradient]], of the generalized [[Rayleigh quotient]] is positively proportional to the vector
The direction of the steepest ascent, which is the [[gradient]], of the generalized [[Rayleigh quotient]] is positively proportional to the vector


:<math>r := Ax - \rho(x) Bx,</math>
:<math>r := Ax - \rho(x) Bx,</math>


called the eigenvector [[Residual (numerical analysis)|residual]]. If a [[preconditioner]] <math>T</math> is available, it is applied to the residual giving vector
called the eigenvector [[Residual (numerical analysis)|residual]]. If a [[preconditioner]] <math>T</math> is available, it is applied to the residual and gives the vector


:<math>w := Tr,</math>
:<math>w := Tr,</math>


called the preconditioned residual. Without preconditioning, we set
called the preconditioned residual. Without preconditioning, we set
<math>T := I</math> and so <math>w := r,</math>. An iterative method
<math>T := I</math> and so <math>w := r</math>. An iterative method


:<math>x^{i+1} := x^i + \alpha^i T(Ax^i - \rho(x^i) Bx^i),</math>
:<math>x^{i+1} := x^i + \alpha^i T(Ax^i - \rho(x^i) Bx^i),</math>


or, in short,
or, in short,


:<math>x^{i+1} := x^i + \alpha^i w^i,\,</math>
:<math>x^{i+1} := x^i + \alpha^i w^i,\,</math>
Line 31: Line 52:
:<math>r^i := Ax^i - \rho(x^i) Bx^i,</math>
:<math>r^i := Ax^i - \rho(x^i) Bx^i,</math>


is known as preconditioned steepest accent (or descent), where the scalar
is known as preconditioned [[steepest ascent]] (or descent), where the scalar
<math>\alpha^i</math> is called the step size. The optimal step size can be determined by maximizing the Rayleigh quotient, i.e.,
<math>\alpha^i</math> is called the step size. The optimal step size can be determined by maximizing the Rayleigh quotient, i.e.,


:<math>x^{i+1} := \arg\max_{y\in span\{x^i,w^i\}} \rho(y)</math>
:<math>x^{i+1} := \arg\max_{y\in span\{x^i,w^i\}} \rho(y)</math>


(or <math>\arg\min</math> in case of minimizing),
(or <math>\arg\min</math> in case of minimizing),
in which case the method is called locally optimal. To further accelerate the convergence of the locally optimal preconditioned steepest accent (or descent), one can add one extra vector to the two-term [[recurrence relation]] to make it three-term:
in which case the method is called locally optimal.
====Three-term recurrence====
To dramatically accelerate the convergence of the locally optimal preconditioned steepest ascent (or descent), one extra vector can be added to the two-term [[recurrence relation]] to make it three-term:


:<math>x^{i+1} := \arg\max_{y\in span\{x^i,w^i,x^{i-1}\}} \rho(y)</math>
:<math>x^{i+1} := \arg\max_{y\in span\{x^i,w^i,x^{i-1}\}} \rho(y)</math>


(use <math>\arg\min</math> in case of minimizing). The maximization/minimization of the Rayleigh quotient in a 3-dimensional subspace can be performed numerically by the [[Rayleigh-Ritz]] method.
(use <math>\arg\min</math> in case of minimizing). The maximization/minimization of the Rayleigh quotient in a 3-dimensional subspace can be performed numerically by the [[Rayleigh–Ritz method]]. Adding more vectors, see, e.g., [[Richardson extrapolation]], does not result in significant acceleration<ref name="AK2001">{{Cite journal| doi = 10.1137/S1064827500366124| title = Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method| journal = [[SIAM Journal on Scientific Computing]]| volume = 23| issue = 2| pages = 517–541| year = 2001| last1 = Knyazev | first1 = Andrew V. | bibcode = 2001SJSC...23..517K| s2cid = 7077751}}</ref> but increases computation costs, so is not generally recommended.


====Numerical stability improvements====
This is a single-vector version of the LOBPCG method. It is one of possible generalization of the [[Preconditioner|preconditioned]] [[Conjugate gradient method|conjugate gradient]] linear solvers to the case of symmetric [[eigenvalue]] problems. Even in the trivial case <math>T=I</math> and <math>B=I</math> the resulting approximation with <math>i>3</math> will be different from that obtained by the [[Lanczos algorithm]], although both approximations will belong to the same [[Krylov subspace]].
As the iterations converge, the vectors <math>x^i</math> and <math>x^{i-1}</math> become nearly [[linearly dependent]], resulting in a precision loss and making the [[Rayleigh–Ritz method]] numerically unstable in the presence of round-off errors. The loss of precision may be avoided by substituting the vector <math>x^{i-1}</math> with a vector <math>p^i</math>, that may be further away from <math>x^{i}</math>, in the basis of the three-dimensional subspace <math>span\{x^i, w^i, x^{i-1}\}</math>, while keeping the subspace unchanged and avoiding [[orthogonalization]] or any other extra operations.<ref name="AK2001"/> Furthermore, orthogonalizing the basis of the three-dimensional subspace may be needed for [[ill-conditioned]] eigenvalue problems to improve stability and attainable accuracy.


====Krylov subspace analogs====
Iterating several approximate [[eigenvectors]] together in a block in a similar locally optimal fashion, gives the full block version of the LOBPCG. It allows robust computation of eigenvectors corresponding to nearly-multiple eigenvalues.
This is a single-vector version of the LOBPCG method—one of possible generalization of the [[Preconditioner|preconditioned]] [[Conjugate gradient method|conjugate gradient]] linear solvers to the case of symmetric [[eigenvalue]] problems.<ref name="AK2001" /> Even in the trivial case <math>T=I</math> and <math>B=I</math> the resulting approximation with <math>i>3</math> will be different from that obtained by the [[Lanczos algorithm]], although both approximations will belong to the same [[Krylov subspace]].


====Practical use scenarios====
An implementation of LOBPCG is available in the public software package [[BLOPEX]], maintained by [[Andrei Knyazev (mathematician)|Andrew Knyazev]]. The LOBPCG algorithm is also implemented in many other libraries, e.g.,: [[ABINIT]], [[Octopus (software)]], [[PESCAN]], Anasazi ([[Trilinos]]), [[SciPy]], [[NGSolve]], and [[PYFEMax]].
Extreme simplicity and high efficiency of the single-vector version of LOBPCG make it attractive for eigenvalue-related applications under severe hardware limitations, ranging from [[spectral clustering]] based real-time [[anomaly detection]] via [[graph partitioning]] on embedded [[ASIC]] or [[FPGA]] to modelling physical phenomena of record computing complexity on exascale [[TOP500]] supercomputers.

===Block version===
====Summary====
Subsequent eigenpairs can be computed one-by-one via single-vector LOBPCG supplemented with an orthogonal deflation or simultaneously as a block. In the former approach, imprecisions in already computed approximate eigenvectors additively affect the accuracy of the subsequently computed eigenvectors, thus increasing the error with every new computation. Iterating several approximate [[eigenvectors]] together in a block in a locally optimal fashion in the block version of the LOBPCG.<ref name="AK2001" /> allows fast, accurate, and robust computation of eigenvectors, including those corresponding to nearly-multiple eigenvalues where the single-vector LOBPCG suffers from slow convergence. The block size can be tuned to balance numerical stability vs. convergence speed vs. computer costs of orthogonalizations and the Rayleigh-Ritz method on every iteration.

====Core design====
The block approach in LOBPCG replaces single-vectors <math>x^{i},\, w^{i},</math> and <math>p^{i}</math> with block-vectors, i.e. matrices <math>X^{i},\, W^{i},</math> and <math>P^{i}</math>, where, e.g., every column of <math>X^{i}</math> approximates one of the eigenvectors. All columns are iterated simultaneously, and the next matrix of approximate eigenvectors <math>X^{i+1}</math> is determined by the [[Rayleigh–Ritz method]] on the subspace spanned by all columns of matrices <math>X^{i},\, W^{i},</math> and <math>P^{i}</math>. Each column of <math>W^{i}</math> is computed simply as the preconditioned residual for every column of <math>X^{i}.</math> The matrix <math>P^{i}</math> is determined such that the subspaces spanned by the columns of <math>[X^{i},\, P^{i}]</math> and of <math>[X^{i},\, X^{i-1}]</math> are the same.

====Numerical stability vs. efficiency====
The outcome of the [[Rayleigh–Ritz method]] is determined by the subspace spanned by all columns of matrices <math>X^{i},\, W^{i},</math> and <math>P^{i}</math>, where a basis of the subspace can theoretically be arbitrary. However, in inexact computer arithmetic the [[Rayleigh–Ritz method]] becomes numerically unstable if some of the basis vectors are approximately linearly dependent. Numerical instabilities typically occur, e.g., if some of the eigenvectors in the iterative block already reach attainable accuracy for a given computer precision and are especially prominent in low precision, e.g., [[single precision]].

The art of multiple different implementation of LOBPCG is to ensure numerical stability of the [[Rayleigh–Ritz method]] at minimal computing costs by choosing a good basis of the subspace. The arguably most stable approach of making the basis vectors orthogonal, e.g., by the [[Gram–Schmidt process]], is also the most computational expensive. For example, LOBPCG implementations,<ref name="matlab">[[MATLAB]] [https://www.mathworks.com/matlabcentral/fileexchange/48-lobpcg-m File Exchange function LOBPCG]</ref><ref name="scipy">[[SciPy]] [https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.lobpcg.html?highlight=lobpcg#scipy.sparse.linalg.lobpcg sparse linear algebra function lobpcg]</ref> utilize unstable but efficient [[Cholesky decomposition]] of the [[Ordinary least squares#Normal matrix|normal matrix]], which is performed only on individual matrices <math>W^{i}</math> and <math>P^{i}</math>, rather than on the whole subspace. The constantly increasing amount of computer memory allows typical block sizes nowadays in the <math>10^3-10^4</math> range, where the percentage of compute time spend on orthogonalizations and the Rayleigh-Ritz method starts dominating.

====Locking of previously converged eigenvectors====
Block methods for eigenvalue problems that iterate subspaces commonly have some of the iterative eigenvectors converged faster than others that motivates locking the already converged eigenvectors, i.e., removing them from the iterative loop, in order to eliminate unnecessary computations and improve numerical stability. A simple removal of an eigenvector may likely result in forming its duplicate in still iterating vectors. The fact that the eigenvectors of symmetric eigenvalue problems are pair-wise orthogonal suggest keeping all iterative vectors orthogonal to the locked vectors.

Locking can be implemented differently maintaining numerical accuracy and stability while minimizing the compute costs. For example, LOBPCG implementations,<ref name="matlab"/><ref name="scipy"/> follow,<ref name="AK2001" /><ref>{{Cite conference |
first1 = A. | last1 =Knyazev | title = Hard and soft locking in iterative methods for symmetric eigenvalue problems | conference = Eighth Copper Mountain Conference on Iterative Methods March 28 - April 2, 2004 | year = 2004 | doi = 10.13140/RG.2.2.11794.48327| url = https://www.researchgate.net/publication/343530965}}</ref> separating hard locking, i.e. a deflation by restriction, where the locked eigenvectors serve as a code input and do not change, from soft locking, where the locked vectors do not participate in the typically most expensive iterative step of computing the residuals, however, fully participate in the Rayleigh—Ritz method and thus are allowed to be changed by the Rayleigh—Ritz method.

====Modifications, LOBPCG II====
LOBPCG includes all columns of matrices <math>X^{i},\, W^{i},</math> and <math>P^{i}</math> into the [[Rayleigh–Ritz method]] resulting in an up to <math>3k</math>-by-<math>3k</math> eigenvalue problem needed to solve and up to <math>9k^2</math> [[dot products]] to compute at every iteration, where <math>k</math> denotes the block size — the number of columns. For large block sizes <math>k</math> this starts dominating compute and I/O costs and limiting parallelization, where multiple compute devices are running simultaneously.

The original LOBPCG paper<ref name="AK2001" /> describes a modification, called LOBPCG II, to address such a problem running the single-vector version of the LOBPCG method for each desired eigenpair with the Rayleigh-Ritz procedure solving <math>k</math> of 3-by-3 projected eigenvalue problems. The global Rayleigh-Ritz procedure for all <math>k</math> eigenpairs is on every iteration but only on the columns of the matrix <math>X^{i}</math>, thus reducing the number of the necessary [[dot products]] to <math>k^2+3k</math> from <math>9k^2</math> and the size of the global projected eigenvalue problem to <math>k</math>-by-<math>k</math> from <math>3k</math>-by-<math>3k</math> on every iteration.
Reference <ref name="PPCG2015">{{Cite journal|
first1 = E. | last1 = Vecharynski| first2 = C. | last2 = Yang| first3 = J.E. | last3 = Pask| title = A projected preconditioned conjugate gradient algorithm for computing many extreme eigenpairs of a hermitian matrix | journal = J. Comput. Phys. | year = 2015 | volume = 290 | pages = 73–89| doi = 10.1016/j.jcp.2015.02.030| arxiv = 1407.7506| bibcode = 2015JCoPh.290...73V| s2cid = 43741860| url = https://www.sciencedirect.com/science/article/pii/S002199911500100X}}</ref> goes further applying the LOBPCG algorithm to each approximate eigenvector separately, i.e., running the unblocked version of the LOBPCG method for each desired eigenpair for a fixed number of iterations. The Rayleigh-Ritz procedures in these runs only need to solve a set of 3 × 3 projected eigenvalue problems. The global Rayleigh-Ritz procedure for all desired eigenpairs is only applied periodically at the end of a fixed number of unblocked LOBPCG iterations.

Such modifications may be less robust compared to the original LOBPCG. Individually running branches of the single-vector LOBPCG may not follow continuous iterative paths flipping instead and creating duplicated approximations to the same eigenvector. The single-vector LOBPCG may be unsuitable for clustered eigenvalues, but separate small-block LOBPCG runs require determining their block sizes automatically during the process of iterations since the number of the clusters of eigenvalues and their sizes may be unknown a priori.

==Convergence theory and practice==
LOBPCG by construction is guaranteed<ref name="AK2001" /> to minimize the [[Rayleigh quotient]] not slower than the block steepest [[gradient descent]], which has a comprehensive convergence theory. Every [[eigenvector]] is a stationary point of the [[Rayleigh quotient]], where the [[gradient]] vanishes. Thus, the [[gradient descent]] may slow down in a vicinity of any [[eigenvector]], however, it is guaranteed to either converge to the eigenvector with a linear convergence rate or, if this eigenvector is a [[saddle point]], the iterative [[Rayleigh quotient]] is more likely to drop down below the corresponding eigenvalue and start converging linearly to the next eigenvalue below. The worst value of the linear convergence rate has been determined<ref name="AK2001" /> and depends on the relative gap between the eigenvalue and the rest of the matrix [[spectrum]] and the quality of the [[preconditioner]], if present.

For a general matrix, there is evidently no way to predict the eigenvectors and thus generate the initial approximations that always work well. The iterative solution by LOBPCG may be sensitive to the initial eigenvectors approximations, e.g., taking longer to converge slowing down as passing intermediate eigenpairs. Moreover, in theory, one cannot guarantee convergence necessarily to the smallest eigenpair, although the probability of the miss is zero. A good quality [[random]] [[Gaussian]] function with the zero [[mean]] is commonly the default in LOBPCG to generate the initial approximations. To fix the initial approximations, one can select a fixed seed for the [[random number generator]].

In contrast to the [[Lanczos method]], LOBPCG rarely exhibits asymptotic [[superlinear convergence]] in practice.

==Partial [[Principal component analysis]] (PCA) and [[Singular Value Decomposition]] (SVD)==
LOBPCG can be trivially adapted for computing several largest [[singular values]] and the corresponding singular vectors (partial SVD), e.g., for [[Principal component analysis#Iterative computation|iterative computation of PCA]], for a data matrix {{math|'''D'''}} with zero mean, without explicitly computing the [[covariance]] matrix {{math|'''D<sup>T</sup>D'''}}, i.e. in [[matrix-free methods|matrix-free fashion]]. The main calculation is evaluation of a function of the product {{math|'''D<sup>T</sup>(D X)'''}} of the covariance matrix {{math|'''D<sup>T</sup>D'''}} and the block-vector {{math|'''X'''}} that iteratively approximates the desired singular vectors. PCA needs the largest eigenvalues of the covariance matrix, while LOBPCG is typically implemented to calculate the smallest ones. A simple work-around is to negate the function, substituting {{math|'''-D<sup>T</sup>(D X)'''}} for {{math|'''D<sup>T</sup>(D X)'''}} and thus reversing the order of the eigenvalues, since LOBPCG does not care if the matrix of the eigenvalue problem is positive definite or not.<ref name="matlab"/>

LOBPCG for PCA and SVD is implemented in SciPy since revision 1.4.0<ref>
[https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.svds.html LOBPCG for SVDS] in [[SciPy]]</ref>

==General software implementations==
LOBPCG's inventor, [[Andrei Knyazev (mathematician)|Andrew Knyazev]], published a reference implementation called Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX)<ref>[[GitHub]] [https://github.com/lobpcg/blopex BLOPEX]</ref><ref>{{Cite journal | doi = 10.1137/060661624| title = Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) in Hypre and PETSc| journal = SIAM Journal on Scientific Computing| volume = 29| issue = 5| pages = 2224| year = 2007| last1 = Knyazev | first1 = A. V.| last2 = Argentati | first2 = M. E.| last3 = Lashuk | first3 = I.| last4 = Ovtchinnikov | first4 = E. E.| arxiv = 0705.2626| bibcode = 2007SJSC...29.2224K| s2cid = 266}}</ref> with interfaces to [[Portable, Extensible Toolkit for Scientific Computation|PETSc]], [[hypre]], and Parallel Hierarchical Adaptive MultiLevel method (PHAML).<ref>PHAML [https://math.nist.gov/~WMitchell/phaml/phaml.html#goals BLOPEX interface to LOBPCG]</ref> Other implementations are available in, e.g., [[GNU Octave]],<ref>[https://octave.sourceforge.io/linear-algebra/function/lobpcg.html Octave linear-algebra function lobpcg]</ref> [[MATLAB]] (including for distributed or tiling arrays),<ref name="matlab" /> [[Java (programming language)|Java]],<ref>[https://code.google.com/archive/p/sparse-eigensolvers-java/ Java LOBPCG] at [[Google Code]]</ref> Anasazi ([[Trilinos]]),<ref>[https://github.com/trilinos/Trilinos/blob/master/packages/anasazi/src/AnasaziLOBPCG.hpp Anasazi Trilinos LOBPCG] at [[GitHub]]</ref> [[SLEPc]],<ref>[http://slepc.upv.es/documentation/current/src/eps/impls/cg/lobpcg/lobpcg.c.html Native SLEPc LOBPCG]</ref><ref>[[SLEPc]] [[BLOPEX]] [https://bitbucket.org/joseroman/blopex interface to LOBPCG]</ref> [[SciPy]]
,<ref name="scipy" /> [[Julia (programming language)|Julia]],<ref>[[Julia (programming language)|Julia]] [https://github.com/JuliaMath/IterativeSolvers.jl/blob/master/src/lobpcg.jl LOBPCG] at [[GitHub]]</ref> MAGMA,<ref>{{Cite journal | url = http://dl.acm.org/citation.cfm?id=2872609 | title = Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product| journal = Proceedings of the Symposium on High Performance Computing (HPC '15). Society for Computer Simulation International, San Diego, CA, USA| pages = 75–82| year = 2015| last1 = Anzt| first1 = Hartwig| last2 = Tomov | first2 = Stanimir| last3 = Dongarra | first3 = Jack| series = HPC '15| isbn = 9781510801011}}</ref> [[Pytorch]],<ref>[[Pytorch]] [https://github.com/pytorch/pytorch/blob/master/torch/_lobpcg.py LOBPCG] at [[GitHub]]</ref> [[Rust (programming language)|Rust]],<ref>[[Rust (programming language)|Rust]] [https://github.com/rust-ndarray/ndarray-linalg/pull/184 LOBPCG] at [[GitHub]]</ref> [[OpenMP]] and [[OpenACC]],<ref>{{Cite conference | url = https://waccpd.org/wp-content/uploads/2019/12/Evaluation-of-Directive-Based-GPU-Programming-Models-on-a-Block-Eigensolver-with-Consideration-of-Large-Sparse-Matrices.pdf | title = Evaluation of Directive-based GPU Programming Models on a Block Eigensolver with Consideration of Large Sparse Matrices | journal =Seventh Workshop on Accelerator Programming Using Directives, SC19: The International Conference for High Performance Computing, Networking, Storage and Analysis| year = 2019| last1 = Rabbi| first1 = Fazlay | last2 = Daley| first2 = Christopher S.| last3 = Aktulga | first3 = Hasan M. | last4 = Wright| first4 = Nicholas J.}}</ref> [[CuPy]] (A [[NumPy]]-compatible array library accelerated by [[CUDA]]),<ref>CuPy: A [[NumPy]]-compatible array library accelerated by [[CUDA]] [https://github.com/cupy/cupy/blob/master/cupyx/scipy/sparse/linalg/_lobpcg.py LOBPCG] at [[GitHub]]</ref>
[[Google JAX]],<ref> [[Google JAX]] [https://github.com/google/jax/pull/10962 LOBPCG initial merge] at [[GitHub]]</ref>
and [[NVIDIA]] AMGX.<ref>[[NVIDIA]] AMGX [https://github.com/NVIDIA/AMGX/blob/master/eigen_examples/LOBPCG LOBPCG] at [[GitHub]]</ref> LOBPCG is implemented,<ref>{{Cite journal | doi = 10.1016/j.jcp.2019.07.003 | url = https://www.sciencedirect.com/science/article/pii/S0021999119304875 | title = Low-rank Riemannian eigensolver for high-dimensional Hamiltonians| journal = Journal of Computational Physics| pages = 718–737| year = 2019| last1 = Rakhuba| first1 = Maxim| last2 = Novikov| first2 = Alexander| last3 = Osedelets| first3 = Ivan| volume = 396 | arxiv = 1811.11049| bibcode = 2019JCoPh.396..718R | s2cid = 119679555 }}</ref> but not included, in [[TensorFlow]].

==Applications==
===[[Data mining]]===
Software packages [[scikit-learn]] and Megaman<ref>{{Cite journal | url = http://jmlr.org/papers/v17/16-109.html | title = Megaman: Scalable Manifold Learning in Python| journal = Journal of Machine Learning Research| volume = 17| issue = 148| pages = 1–5| year = 2016| last1 = McQueen| first1 = James | display-authors = etal| bibcode = 2016JMLR...17..148M}}</ref> use LOBPCG to scale [[spectral clustering]]<ref>{{Cite web | url=http://scikit-learn.org/stable/modules/generated/sklearn.cluster.SpectralClustering.html |title = Sklearn.cluster.SpectralClustering — scikit-learn 0.22.1 documentation}}</ref> and [[manifold learning]]<ref>{{Cite web | url=http://scikit-learn.org/stable/modules/generated/sklearn.manifold.spectral_embedding.html |title = Sklearn.manifold.spectral_embedding — scikit-learn 0.22.1 documentation}}</ref> via [[Nonlinear dimensionality reduction#Laplacian eigenmaps|Laplacian eigenmaps]] to large data sets. [[NVIDIA]] has implemented<ref>{{Cite journal | url = https://devblogs.nvidia.com/fast-spectral-graph-partitioning-gpus/ | title = Fast Spectral Graph Partitioning on GPUs| journal = NVIDIA Developer Blog| year = 2016| last1 = Naumov| first1 = Maxim}}</ref> LOBPCG in its nvGRAPH library introduced in [[CUDA]] 8. Sphynx,<ref>{{Cite web | url=https://docs.trilinos.org/dev/packages/zoltan2/doc/html/sphynxPage.html |title = SGraph partitioning with Sphynx}}</ref> a hybrid distributed- and shared-memory-enabled parallel graph partitioner - the first graph partitioning tool that works on GPUs on distributed-memory settings - uses [[spectral clustering]] for [[graph partitioning]], computing eigenvectors on the [[Laplacian matrix]] of the graph using LOBPCG from the [[Anasazi]] package.

===[[Material sciences]]===
LOBPCG is implemented in [[ABINIT]]<ref>[https://docs.abinit.org/variables/dev/#wfoptalg ABINIT Docs: WaveFunction OPTimisation ALGorithm]</ref> (including [[CUDA]] version) and [[Octopus (software)|Octopus]].<ref>{{Cite web |url=http://octopus-code.org/wiki/Developers_Manual:LOBPCG |title=Octopus Developers Manual:LOBPCG |access-date=2018-07-29 |archive-date=2018-07-29 |archive-url=https://web.archive.org/web/20180729111709/http://octopus-code.org/wiki/Developers_Manual:LOBPCG |url-status=dead }}</ref> It has been used for multi-billion size matrices by [[Gordon Bell Prize]] finalists, on the [[Earth Simulator]] [[supercomputer]] in Japan.<ref>{{Cite conference| doi = 10.1109/SC.2005.1| title = 16.447 TFlops and 159-Billion-dimensional Exact-diagonalization for Trapped Fermion-Hubbard Model on the Earth Simulator| work = Proc. ACM/IEEE Conference on Supercomputing (SC'05)| pages = 44| year = 2005| last1 = Yamada | first1 = S.| last2 = Imamura | first2 = T.| last3 = Machida | first3 = M.| isbn = 1-59593-061-2}}</ref><ref>{{Cite conference| doi = 10.1145/1188455.1188504| title = Gordon Bell finalists I—High-performance computing for exact numerical approaches to quantum many-body problems on the earth simulator| conference = Proc. ACM/IEEE conference on Supercomputing (SC '06)| pages = 47| year = 2006| last1 = Yamada | first1 = S. | last2 = Imamura | first2 = T. | last3 = Kano | first3 = T. | last4 = Machida | first4 = M. | isbn = 0769527000}}</ref>
[[Hubbard model]] for strongly-correlated electron systems to understand the mechanism behind the [[superconductivity]] uses LOBPCG to calculate the [[ground state]] of the [[Hamiltonian (quantum mechanics)|Hamiltonian]] on the [[K computer]]<ref>{{Cite conference| doi = 10.1007/978-3-319-69953-0_14 | conference = Asian Conference on Supercomputing Frontiers | title = High Performance LOBPCG Method for Solving Multiple Eigenvalues of Hubbard Model: Efficiency of Communication Avoiding Neumann Expansion Preconditioner | work = Yokota R., Wu W. (eds) Supercomputing Frontiers. SCFA 2018. Lecture Notes in Computer Science, vol 10776. Springer, Cham| pages = 243–256| year = 2018| last1 = Yamada | first1 = S.| last2 = Imamura | first2 = T.| last3 = Machida | first3 = M.| doi-access = free}}</ref> and multi-GPU systems.<ref>{{Cite conference | conference = SupercomputingAsia (SCA) | title = High performance parallel LOBPCG method for large Hamiltonian derived from Hubbard model on multi-GPU systems | year = 2022| last1 = Yamada | first1 = S.| last2 = Imamura | first2 = T.| last3 = Machida | first3 = M.}}</ref>

There are [[MATLAB]]<ref>{{Cite journal| first1 = C. |last1 = Yang| first2 = J. C.|last2 = Meza| first3 = B.|last3 = Lee| first4 = L.-W.|last4 = Wang | title = KSSOLV - a MATLAB toolbox for solving the Kohn-Sham equations| journal = ACM Trans. Math. Softw.| volume = 36| pages = 1–35 | year = 2009|issue = 2| doi = 10.1145/1499096.1499099|s2cid = 624897}}</ref> and [[Julia (programming language)|Julia]]<ref>{{Cite journal | first1 = Fadjar |last1 = Fathurrahman | first2 = Mohammad Kemal |last2 = Agusta | first3 = Adhitya Gandaryus |last3 = Saputro | first4 = Hermawan Kresno |last4 = Dipojono| title = PWDFT.jl: A Julia package for electronic structure calculation using density functional theory and plane wave basis|journal = Computer Physics Communications | doi = 10.1016/j.cpc.2020.107372| year = 2020|volume = 256 |page = 107372 |bibcode = 2020CoPhC.25607372F |s2cid = 219517717 }}</ref><ref>[https://github.com/JuliaMolSim/DFTK.jl Density-functional toolkit (DFTK). Plane wave density functional theory in Julia]</ref>
versions of LOBPCG for [[Kohn-Sham]] equations and [[density functional theory]] (DFT) using the plane wave basis.
Recent implementations include TTPY,<ref>{{Cite journal | doi = 10.1063/1.4962420| pmid = 27782616| title = Calculating vibrational spectra of molecules using tensor train decomposition| journal = J. Chem. Phys. | volume = 145| year = 2016| issue = 12| pages = 124101| last1 = Rakhuba| first1 = Maxim | last2 = Oseledets | first2 = Ivan| bibcode = 2016JChPh.145l4101R| arxiv =1605.08422| s2cid = 44797395}}</ref> Platypus‐QM,<ref>{{Cite journal | doi = 10.1002/jcc.24318 | pmid = 26940542| title = Development of massive multilevel molecular dynamics simulation program, platypus (PLATform for dYnamic protein unified simulation), for the elucidation of protein functions| journal = J. Comput. Chem.| volume = 37| issue = 12| pages = 1125–1132| year = 2016| last1 = Takano| first1 = Yu | last2 = Nakata | first2 = Kazuto| last3 = Yonezawa | first3 = Yasushige| last4 = Nakamura | first4 = Haruki| pmc =4825406}}</ref> MFDn,<ref>{{Cite journal|arxiv=1609.01689 | title = Accelerating Nuclear Configuration Interaction Calculations through a Preconditioned Block Iterative Eigensolver| journal = Computer Physics Communications| volume = 222| issue = 1| pages = 1–13| year = 2018| last1 = Shao| first1 = Meiyue | display-authors = etal| doi = 10.1016/j.cpc.2017.09.004| bibcode = 2018CoPhC.222....1S| s2cid = 13996642}}</ref> ACE-Molecule,<ref>{{Cite journal| title = ACE-Molecule: An open-source real-space quantum chemistry package| journal = The Journal of Chemical Physics| volume = 152| issue = 12| pages = 124110| year = 2020| last1 = Kang| first1 = Sungwoo | display-authors = etal| doi = 10.1063/5.0002959| pmid = 32241122| bibcode = 2020JChPh.152l4110K| s2cid = 214768088| doi-access = free}}</ref> LACONIC.<ref>{{Cite report|last1=Baczewski|first1=Andrew David|last2=Brickson|first2=Mitchell Ian|last3=Campbell|first3=Quinn|last4=Jacobson|first4=Noah Tobias|last5=Maurer|first5=Leon| title =A Quantum Analog Coprocessor for Correlated Electron Systems Simulation|doi=10.2172/1671166|url=https://www.osti.gov/biblio/1671166|location=United States|date=2020-09-01|institution=Sandia National Lab. (SNL-NM)|osti=1671166}}</ref>

===[[Mechanics]] and [[fluid]]s===
LOBPCG from BLOPEX is used for [[preconditioner]] setup in Multilevel [[Balancing Domain Decomposition by Constraints]] (BDDC) solver library BDDCML, which is incorporated into OpenFTL (Open [[Finite element]] Template Library) and Flow123d simulator of underground water flow, solute and [[heat transport]] in fractured [[porous media]]. LOBPCG has been implemented<ref>{{Cite conference | url = https://www.dynalook.com/conferences/15th-international-ls-dyna-conference/implicit/a-survey-of-eigen-solution-methods-in-ls-dyna-r |
title = A Survey of Eigen Solution Methods in LS-DYNA | conference = 15th International LS-DYNA Conference, Detroit | year = 2018 }}</ref> in [[LS-DYNA]] and indirectly in [[ANSYS]].<ref>{{Cite web | url = https://lsdyna.ansys.com/wp-content/uploads/2023/12/LS-DYNA-R15-updates-Roger-Grimes-Thomas-Borrvall-Ansys.pdf |
title = LS-DYNA 2024R1 (R15.0) Recent Developments | year = 2024 | page = 15 }}</ref>

===[[Maxwell's equations]]===
LOBPCG is one of core eigenvalue solvers in PYFEMax and high performance multiphysics [[finite element]] software Netgen/NGSolve. LOBPCG from [[hypre]] is incorporated into [[open source]] lightweight scalable [[C++]] library for [[finite element]] methods [[MFEM]], which is used in many projects, including [[BLAST (biotechnology)|BLAST]], XBraid, [[VisIt]], xSDK, the FASTMath institute in [[SciDAC]], and the co-design Center for Efficient Exascale Discretizations (CEED) in the [[Exascale computing]] Project.

===[[Denoising]]===
Iterative LOBPCG-based approximate [[low-pass filter]] can be used for [[denoising]]; see,<ref>{{Cite conference |
first1 = A. | last1 =Knyazev | first2 = A. | last2 =Malyshev | title = Accelerated graph-based spectral polynomial filters | conference = 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), Boston, MA | year = 2015 | pages = 1–6 | doi = 10.1109/MLSP.2015.7324315| arxiv = 1509.02468 }}</ref> e.g., to accelerate [[total variation denoising]].

===[[Image segmentation]]===
[[Image segmentation]] via [[spectral clustering]] performs a low-dimension [[embedding]] using an [[Affine transformation|affinity]] matrix between pixels, followed by clustering of the components of the eigenvectors in the low dimensional space, e.g., using the [[graph Laplacian]] for the [[bilateral filter]]. [[Image segmentation]] via spectral [[graph partitioning]] by LOBPCG with [[multigrid]] [[preconditioning]] has been first proposed in <ref>{{Cite conference | url = https://www.researchgate.net/publication/343531874 | title = Modern preconditioned eigensolvers for spectral image segmentation and graph bisection | conference = Clustering Large Data Sets; Third IEEE International Conference on Data Mining (ICDM 2003) Melbourne, Florida: IEEE Computer Society| editor = Boley| editor2 = Dhillon| editor3 = Ghosh| editor4 = Kogan | pages = 59–62| year = 2003| last1 = Knyazev| first1 = Andrew V.}}</ref> and actually tested in <ref>{{Cite conference | url = https://www.researchgate.net/publication/354448354 | title = Multiscale Spectral Image Segmentation Multiscale preconditioning for computing eigenvalues of graph Laplacians in image segmentation | conference = Fast Manifold Learning Workshop, WM Williamburg, VA| year = 2006| last1 = Knyazev| first1 = Andrew V. | doi=10.13140/RG.2.2.35280.02565}}</ref> and.<ref>{{Cite conference | url = https://www.researchgate.net/publication/343531874 | title = Multiscale Spectral Graph Partitioning and Image Segmentation | conference = Workshop on Algorithms for Modern Massive Datasets Stanford University and Yahoo! Research| year = 2006| last1 = Knyazev| first1 = Andrew V.}}</ref> The latter approach has been later implemented in Python [[scikit-learn]]<ref>{{Cite web|url=https://scikit-learn.org/stable/modules/clustering.html#spectral-clustering|title=Spectral Clustering — scikit-learn documentation}}</ref> that uses LOBPCG from [[SciPy]] with [[Multigrid method#Algebraic multigrid (AMG)|algebraic multigrid preconditioning]] for solving the eigenvalue problem for the graph Laplacian.


==References==
==References==
<references/>
{{Citation

| last = Knyazev
==External links==
| first = A.V.
*[http://www.mathworks.com/matlabcentral/fileexchange/48-lobpcg-m LOBPCG] in [[MATLAB]]
| last2 =
*[http://octave.sourceforge.net/linear-algebra/function/lobpcg.html LOBPCG] in [[Octave]]
| first2 =
*[http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.lobpcg.html LOBPCG] in [[SciPy]]
| author2-link =
*[https://code.google.com/p/sparse-eigensolvers-java/ LOBPCG] in [[Java (programming language)|Java]] at [[Google Code]]
| title = Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method
*[https://github.com/lobpcg/blopex LOBPCG in Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX)] at [[GitHub]] and [https://code.google.com/p/blopex/ archived] at [[Google Code]]
| journal = SIAM Journal on Scientific Computing
| volume = 23
| issue = 2
| pages = 517&ndash;541
| date =
| year = 2001
| doi = 10.1137/S1064827500366124
| id = }}


{{Numerical linear algebra}}
{{Numerical linear algebra}}


{{DEFAULTSORT:Lobpcg}}
[[Category:Numerical linear algebra]]
[[Category:Numerical linear algebra]]
[[Category:Scientific simulation software]]
[[Category:Scientific simulation software]]

Latest revision as of 05:36, 21 December 2024

Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is a matrix-free method for finding the largest (or smallest) eigenvalues and the corresponding eigenvectors of a symmetric generalized eigenvalue problem

for a given pair of complex Hermitian or real symmetric matrices, where the matrix is also assumed positive-definite.

Background

[edit]

Kantorovich in 1948 proposed calculating the smallest eigenvalue of a symmetric matrix by steepest descent using a direction of a scaled gradient of a Rayleigh quotient in a scalar product , with the step size computed by minimizing the Rayleigh quotient in the linear span of the vectors and , i.e. in a locally optimal manner. Samokish[1] proposed applying a preconditioner to the residual vector to generate the preconditioned direction and derived asymptotic, as approaches the eigenvector, convergence rate bounds. D'yakonov suggested[2] spectrally equivalent preconditioning and derived non-asymptotic convergence rate bounds. Block locally optimal multi-step steepest descent for eigenvalue problems was described in.[3] Local minimization of the Rayleigh quotient on the subspace spanned by the current approximation, the current residual and the previous approximation, as well as its block version, appeared in.[4] The preconditioned version was analyzed in [5] and.[6]

Main features

[edit]

Source:[7]

  • Matrix-free, i.e. does not require storing the coefficient matrix explicitly, but can access the matrix by evaluating matrix-vector products.
  • Factorization-free, i.e. does not require any matrix decomposition even for a generalized eigenvalue problem.
  • The costs per iteration and the memory use are competitive with those of the Lanczos method, computing a single extreme eigenpair of a symmetric matrix.
  • Linear convergence is theoretically guaranteed and practically observed.
  • Accelerated convergence due to direct preconditioning, in contrast to the Lanczos method, including variable and non-symmetric as well as fixed and positive definite preconditioning.
  • Allows trivial incorporation of efficient domain decomposition and multigrid techniques via preconditioning.
  • Warm starts and computes an approximation to the eigenvector on every iteration.
  • More numerically stable compared to the Lanczos method, and can operate in low-precision computer arithmetic.
  • Easy to implement, with many versions already appeared.
  • Blocking allows utilizing highly efficient matrix-matrix operations, e.g., BLAS 3.
  • The block size can be tuned to balance convergence speed vs. computer costs of orthogonalizations and the Rayleigh-Ritz method on every iteration.

Algorithm

[edit]

Single-vector version

[edit]

Preliminaries: Gradient descent for eigenvalue problems

[edit]

The method performs an iterative maximization (or minimization) of the generalized Rayleigh quotient

which results in finding largest (or smallest) eigenpairs of

The direction of the steepest ascent, which is the gradient, of the generalized Rayleigh quotient is positively proportional to the vector

called the eigenvector residual. If a preconditioner is available, it is applied to the residual and gives the vector

called the preconditioned residual. Without preconditioning, we set and so . An iterative method

or, in short,

is known as preconditioned steepest ascent (or descent), where the scalar is called the step size. The optimal step size can be determined by maximizing the Rayleigh quotient, i.e.,

(or in case of minimizing), in which case the method is called locally optimal.

Three-term recurrence

[edit]

To dramatically accelerate the convergence of the locally optimal preconditioned steepest ascent (or descent), one extra vector can be added to the two-term recurrence relation to make it three-term:

(use in case of minimizing). The maximization/minimization of the Rayleigh quotient in a 3-dimensional subspace can be performed numerically by the Rayleigh–Ritz method. Adding more vectors, see, e.g., Richardson extrapolation, does not result in significant acceleration[8] but increases computation costs, so is not generally recommended.

Numerical stability improvements

[edit]

As the iterations converge, the vectors and become nearly linearly dependent, resulting in a precision loss and making the Rayleigh–Ritz method numerically unstable in the presence of round-off errors. The loss of precision may be avoided by substituting the vector with a vector , that may be further away from , in the basis of the three-dimensional subspace , while keeping the subspace unchanged and avoiding orthogonalization or any other extra operations.[8] Furthermore, orthogonalizing the basis of the three-dimensional subspace may be needed for ill-conditioned eigenvalue problems to improve stability and attainable accuracy.

Krylov subspace analogs

[edit]

This is a single-vector version of the LOBPCG method—one of possible generalization of the preconditioned conjugate gradient linear solvers to the case of symmetric eigenvalue problems.[8] Even in the trivial case and the resulting approximation with will be different from that obtained by the Lanczos algorithm, although both approximations will belong to the same Krylov subspace.

Practical use scenarios

[edit]

Extreme simplicity and high efficiency of the single-vector version of LOBPCG make it attractive for eigenvalue-related applications under severe hardware limitations, ranging from spectral clustering based real-time anomaly detection via graph partitioning on embedded ASIC or FPGA to modelling physical phenomena of record computing complexity on exascale TOP500 supercomputers.

Block version

[edit]

Summary

[edit]

Subsequent eigenpairs can be computed one-by-one via single-vector LOBPCG supplemented with an orthogonal deflation or simultaneously as a block. In the former approach, imprecisions in already computed approximate eigenvectors additively affect the accuracy of the subsequently computed eigenvectors, thus increasing the error with every new computation. Iterating several approximate eigenvectors together in a block in a locally optimal fashion in the block version of the LOBPCG.[8] allows fast, accurate, and robust computation of eigenvectors, including those corresponding to nearly-multiple eigenvalues where the single-vector LOBPCG suffers from slow convergence. The block size can be tuned to balance numerical stability vs. convergence speed vs. computer costs of orthogonalizations and the Rayleigh-Ritz method on every iteration.

Core design

[edit]

The block approach in LOBPCG replaces single-vectors and with block-vectors, i.e. matrices and , where, e.g., every column of approximates one of the eigenvectors. All columns are iterated simultaneously, and the next matrix of approximate eigenvectors is determined by the Rayleigh–Ritz method on the subspace spanned by all columns of matrices and . Each column of is computed simply as the preconditioned residual for every column of The matrix is determined such that the subspaces spanned by the columns of and of are the same.

Numerical stability vs. efficiency

[edit]

The outcome of the Rayleigh–Ritz method is determined by the subspace spanned by all columns of matrices and , where a basis of the subspace can theoretically be arbitrary. However, in inexact computer arithmetic the Rayleigh–Ritz method becomes numerically unstable if some of the basis vectors are approximately linearly dependent. Numerical instabilities typically occur, e.g., if some of the eigenvectors in the iterative block already reach attainable accuracy for a given computer precision and are especially prominent in low precision, e.g., single precision.

The art of multiple different implementation of LOBPCG is to ensure numerical stability of the Rayleigh–Ritz method at minimal computing costs by choosing a good basis of the subspace. The arguably most stable approach of making the basis vectors orthogonal, e.g., by the Gram–Schmidt process, is also the most computational expensive. For example, LOBPCG implementations,[9][10] utilize unstable but efficient Cholesky decomposition of the normal matrix, which is performed only on individual matrices and , rather than on the whole subspace. The constantly increasing amount of computer memory allows typical block sizes nowadays in the range, where the percentage of compute time spend on orthogonalizations and the Rayleigh-Ritz method starts dominating.

Locking of previously converged eigenvectors

[edit]

Block methods for eigenvalue problems that iterate subspaces commonly have some of the iterative eigenvectors converged faster than others that motivates locking the already converged eigenvectors, i.e., removing them from the iterative loop, in order to eliminate unnecessary computations and improve numerical stability. A simple removal of an eigenvector may likely result in forming its duplicate in still iterating vectors. The fact that the eigenvectors of symmetric eigenvalue problems are pair-wise orthogonal suggest keeping all iterative vectors orthogonal to the locked vectors.

Locking can be implemented differently maintaining numerical accuracy and stability while minimizing the compute costs. For example, LOBPCG implementations,[9][10] follow,[8][11] separating hard locking, i.e. a deflation by restriction, where the locked eigenvectors serve as a code input and do not change, from soft locking, where the locked vectors do not participate in the typically most expensive iterative step of computing the residuals, however, fully participate in the Rayleigh—Ritz method and thus are allowed to be changed by the Rayleigh—Ritz method.

Modifications, LOBPCG II

[edit]

LOBPCG includes all columns of matrices and into the Rayleigh–Ritz method resulting in an up to -by- eigenvalue problem needed to solve and up to dot products to compute at every iteration, where denotes the block size — the number of columns. For large block sizes this starts dominating compute and I/O costs and limiting parallelization, where multiple compute devices are running simultaneously.

The original LOBPCG paper[8] describes a modification, called LOBPCG II, to address such a problem running the single-vector version of the LOBPCG method for each desired eigenpair with the Rayleigh-Ritz procedure solving of 3-by-3 projected eigenvalue problems. The global Rayleigh-Ritz procedure for all eigenpairs is on every iteration but only on the columns of the matrix , thus reducing the number of the necessary dot products to from and the size of the global projected eigenvalue problem to -by- from -by- on every iteration. Reference [12] goes further applying the LOBPCG algorithm to each approximate eigenvector separately, i.e., running the unblocked version of the LOBPCG method for each desired eigenpair for a fixed number of iterations. The Rayleigh-Ritz procedures in these runs only need to solve a set of 3 × 3 projected eigenvalue problems. The global Rayleigh-Ritz procedure for all desired eigenpairs is only applied periodically at the end of a fixed number of unblocked LOBPCG iterations.

Such modifications may be less robust compared to the original LOBPCG. Individually running branches of the single-vector LOBPCG may not follow continuous iterative paths flipping instead and creating duplicated approximations to the same eigenvector. The single-vector LOBPCG may be unsuitable for clustered eigenvalues, but separate small-block LOBPCG runs require determining their block sizes automatically during the process of iterations since the number of the clusters of eigenvalues and their sizes may be unknown a priori.

Convergence theory and practice

[edit]

LOBPCG by construction is guaranteed[8] to minimize the Rayleigh quotient not slower than the block steepest gradient descent, which has a comprehensive convergence theory. Every eigenvector is a stationary point of the Rayleigh quotient, where the gradient vanishes. Thus, the gradient descent may slow down in a vicinity of any eigenvector, however, it is guaranteed to either converge to the eigenvector with a linear convergence rate or, if this eigenvector is a saddle point, the iterative Rayleigh quotient is more likely to drop down below the corresponding eigenvalue and start converging linearly to the next eigenvalue below. The worst value of the linear convergence rate has been determined[8] and depends on the relative gap between the eigenvalue and the rest of the matrix spectrum and the quality of the preconditioner, if present.

For a general matrix, there is evidently no way to predict the eigenvectors and thus generate the initial approximations that always work well. The iterative solution by LOBPCG may be sensitive to the initial eigenvectors approximations, e.g., taking longer to converge slowing down as passing intermediate eigenpairs. Moreover, in theory, one cannot guarantee convergence necessarily to the smallest eigenpair, although the probability of the miss is zero. A good quality random Gaussian function with the zero mean is commonly the default in LOBPCG to generate the initial approximations. To fix the initial approximations, one can select a fixed seed for the random number generator.

In contrast to the Lanczos method, LOBPCG rarely exhibits asymptotic superlinear convergence in practice.

LOBPCG can be trivially adapted for computing several largest singular values and the corresponding singular vectors (partial SVD), e.g., for iterative computation of PCA, for a data matrix D with zero mean, without explicitly computing the covariance matrix DTD, i.e. in matrix-free fashion. The main calculation is evaluation of a function of the product DT(D X) of the covariance matrix DTD and the block-vector X that iteratively approximates the desired singular vectors. PCA needs the largest eigenvalues of the covariance matrix, while LOBPCG is typically implemented to calculate the smallest ones. A simple work-around is to negate the function, substituting -DT(D X) for DT(D X) and thus reversing the order of the eigenvalues, since LOBPCG does not care if the matrix of the eigenvalue problem is positive definite or not.[9]

LOBPCG for PCA and SVD is implemented in SciPy since revision 1.4.0[13]

General software implementations

[edit]

LOBPCG's inventor, Andrew Knyazev, published a reference implementation called Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX)[14][15] with interfaces to PETSc, hypre, and Parallel Hierarchical Adaptive MultiLevel method (PHAML).[16] Other implementations are available in, e.g., GNU Octave,[17] MATLAB (including for distributed or tiling arrays),[9] Java,[18] Anasazi (Trilinos),[19] SLEPc,[20][21] SciPy ,[10] Julia,[22] MAGMA,[23] Pytorch,[24] Rust,[25] OpenMP and OpenACC,[26] CuPy (A NumPy-compatible array library accelerated by CUDA),[27] Google JAX,[28] and NVIDIA AMGX.[29] LOBPCG is implemented,[30] but not included, in TensorFlow.

Applications

[edit]

Software packages scikit-learn and Megaman[31] use LOBPCG to scale spectral clustering[32] and manifold learning[33] via Laplacian eigenmaps to large data sets. NVIDIA has implemented[34] LOBPCG in its nvGRAPH library introduced in CUDA 8. Sphynx,[35] a hybrid distributed- and shared-memory-enabled parallel graph partitioner - the first graph partitioning tool that works on GPUs on distributed-memory settings - uses spectral clustering for graph partitioning, computing eigenvectors on the Laplacian matrix of the graph using LOBPCG from the Anasazi package.

LOBPCG is implemented in ABINIT[36] (including CUDA version) and Octopus.[37] It has been used for multi-billion size matrices by Gordon Bell Prize finalists, on the Earth Simulator supercomputer in Japan.[38][39] Hubbard model for strongly-correlated electron systems to understand the mechanism behind the superconductivity uses LOBPCG to calculate the ground state of the Hamiltonian on the K computer[40] and multi-GPU systems.[41]

There are MATLAB[42] and Julia[43][44] versions of LOBPCG for Kohn-Sham equations and density functional theory (DFT) using the plane wave basis. Recent implementations include TTPY,[45] Platypus‐QM,[46] MFDn,[47] ACE-Molecule,[48] LACONIC.[49]

LOBPCG from BLOPEX is used for preconditioner setup in Multilevel Balancing Domain Decomposition by Constraints (BDDC) solver library BDDCML, which is incorporated into OpenFTL (Open Finite element Template Library) and Flow123d simulator of underground water flow, solute and heat transport in fractured porous media. LOBPCG has been implemented[50] in LS-DYNA and indirectly in ANSYS.[51]

LOBPCG is one of core eigenvalue solvers in PYFEMax and high performance multiphysics finite element software Netgen/NGSolve. LOBPCG from hypre is incorporated into open source lightweight scalable C++ library for finite element methods MFEM, which is used in many projects, including BLAST, XBraid, VisIt, xSDK, the FASTMath institute in SciDAC, and the co-design Center for Efficient Exascale Discretizations (CEED) in the Exascale computing Project.

Iterative LOBPCG-based approximate low-pass filter can be used for denoising; see,[52] e.g., to accelerate total variation denoising.

Image segmentation via spectral clustering performs a low-dimension embedding using an affinity matrix between pixels, followed by clustering of the components of the eigenvectors in the low dimensional space, e.g., using the graph Laplacian for the bilateral filter. Image segmentation via spectral graph partitioning by LOBPCG with multigrid preconditioning has been first proposed in [53] and actually tested in [54] and.[55] The latter approach has been later implemented in Python scikit-learn[56] that uses LOBPCG from SciPy with algebraic multigrid preconditioning for solving the eigenvalue problem for the graph Laplacian.

References

[edit]
  1. ^ Samokish, B.A. (1958). "The steepest descent method for an eigenvalue problem with semi-bounded operators". Izvestiya Vuzov, Math. (5): 105–114.
  2. ^ D'yakonov, E. G. (1996). Optimization in solving elliptic problems. CRC-Press. p. 592. ISBN 978-0-8493-2872-5.
  3. ^ Cullum, Jane K.; Willoughby, Ralph A. (2002). Lanczos algorithms for large symmetric eigenvalue computations. Vol. 1 (Reprint of the 1985 original). Society for Industrial and Applied Mathematics.
  4. ^ Knyazev, Andrew V. (1987). "Convergence rate estimates for iterative methods for mesh symmetric eigenvalue problem". Soviet Journal of Numerical Analysis and Mathematical Modelling. 2 (5): 371–396. doi:10.1515/rnam.1987.2.5.371. S2CID 121473545.
  5. ^ Knyazev, A. V. (1991). "A Preconditioned Conjugate Gradient Method for Eigenvalue Problems and its Implementation in a Subspace". In Albrecht, J.; Collatz, L.; Hagedorn, P.; Velte, W. (eds.). Numerical Treatment of Eigenvalue Problems Vol. 5. International Series of Numerical Mathematics. Vol. 96. pp. 143–154. doi:10.1007/978-3-0348-6332-2_11. ISBN 978-3-0348-6334-6.
  6. ^ Knyazev, Andrew V. (1998). "Preconditioned eigensolvers - an oxymoron?". Electronic Transactions on Numerical Analysis. 7: 104–123.
  7. ^ Knyazev, Andrew (2017). "Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method (LOBPCG)". arXiv:1708.08354 [cs.NA].
  8. ^ a b c d e f g h Knyazev, Andrew V. (2001). "Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method". SIAM Journal on Scientific Computing. 23 (2): 517–541. Bibcode:2001SJSC...23..517K. doi:10.1137/S1064827500366124. S2CID 7077751.
  9. ^ a b c d MATLAB File Exchange function LOBPCG
  10. ^ a b c SciPy sparse linear algebra function lobpcg
  11. ^ Knyazev, A. (2004). Hard and soft locking in iterative methods for symmetric eigenvalue problems. Eighth Copper Mountain Conference on Iterative Methods March 28 - April 2, 2004. doi:10.13140/RG.2.2.11794.48327.
  12. ^ Vecharynski, E.; Yang, C.; Pask, J.E. (2015). "A projected preconditioned conjugate gradient algorithm for computing many extreme eigenpairs of a hermitian matrix". J. Comput. Phys. 290: 73–89. arXiv:1407.7506. Bibcode:2015JCoPh.290...73V. doi:10.1016/j.jcp.2015.02.030. S2CID 43741860.
  13. ^ LOBPCG for SVDS in SciPy
  14. ^ GitHub BLOPEX
  15. ^ Knyazev, A. V.; Argentati, M. E.; Lashuk, I.; Ovtchinnikov, E. E. (2007). "Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) in Hypre and PETSc". SIAM Journal on Scientific Computing. 29 (5): 2224. arXiv:0705.2626. Bibcode:2007SJSC...29.2224K. doi:10.1137/060661624. S2CID 266.
  16. ^ PHAML BLOPEX interface to LOBPCG
  17. ^ Octave linear-algebra function lobpcg
  18. ^ Java LOBPCG at Google Code
  19. ^ Anasazi Trilinos LOBPCG at GitHub
  20. ^ Native SLEPc LOBPCG
  21. ^ SLEPc BLOPEX interface to LOBPCG
  22. ^ Julia LOBPCG at GitHub
  23. ^ Anzt, Hartwig; Tomov, Stanimir; Dongarra, Jack (2015). "Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product". Proceedings of the Symposium on High Performance Computing (HPC '15). Society for Computer Simulation International, San Diego, CA, USA. HPC '15: 75–82. ISBN 9781510801011.
  24. ^ Pytorch LOBPCG at GitHub
  25. ^ Rust LOBPCG at GitHub
  26. ^ Rabbi, Fazlay; Daley, Christopher S.; Aktulga, Hasan M.; Wright, Nicholas J. (2019). Evaluation of Directive-based GPU Programming Models on a Block Eigensolver with Consideration of Large Sparse Matrices (PDF). Seventh Workshop on Accelerator Programming Using Directives, SC19: The International Conference for High Performance Computing, Networking, Storage and Analysis.
  27. ^ CuPy: A NumPy-compatible array library accelerated by CUDA LOBPCG at GitHub
  28. ^ Google JAX LOBPCG initial merge at GitHub
  29. ^ NVIDIA AMGX LOBPCG at GitHub
  30. ^ Rakhuba, Maxim; Novikov, Alexander; Osedelets, Ivan (2019). "Low-rank Riemannian eigensolver for high-dimensional Hamiltonians". Journal of Computational Physics. 396: 718–737. arXiv:1811.11049. Bibcode:2019JCoPh.396..718R. doi:10.1016/j.jcp.2019.07.003. S2CID 119679555.
  31. ^ McQueen, James; et al. (2016). "Megaman: Scalable Manifold Learning in Python". Journal of Machine Learning Research. 17 (148): 1–5. Bibcode:2016JMLR...17..148M.
  32. ^ "Sklearn.cluster.SpectralClustering — scikit-learn 0.22.1 documentation".
  33. ^ "Sklearn.manifold.spectral_embedding — scikit-learn 0.22.1 documentation".
  34. ^ Naumov, Maxim (2016). "Fast Spectral Graph Partitioning on GPUs". NVIDIA Developer Blog.
  35. ^ "SGraph partitioning with Sphynx".
  36. ^ ABINIT Docs: WaveFunction OPTimisation ALGorithm
  37. ^ "Octopus Developers Manual:LOBPCG". Archived from the original on 2018-07-29. Retrieved 2018-07-29.
  38. ^ Yamada, S.; Imamura, T.; Machida, M. (2005). 16.447 TFlops and 159-Billion-dimensional Exact-diagonalization for Trapped Fermion-Hubbard Model on the Earth Simulator. Proc. ACM/IEEE Conference on Supercomputing (SC'05). p. 44. doi:10.1109/SC.2005.1. ISBN 1-59593-061-2.
  39. ^ Yamada, S.; Imamura, T.; Kano, T.; Machida, M. (2006). Gordon Bell finalists I—High-performance computing for exact numerical approaches to quantum many-body problems on the earth simulator. Proc. ACM/IEEE conference on Supercomputing (SC '06). p. 47. doi:10.1145/1188455.1188504. ISBN 0769527000.
  40. ^ Yamada, S.; Imamura, T.; Machida, M. (2018). High Performance LOBPCG Method for Solving Multiple Eigenvalues of Hubbard Model: Efficiency of Communication Avoiding Neumann Expansion Preconditioner. Asian Conference on Supercomputing Frontiers. Yokota R., Wu W. (eds) Supercomputing Frontiers. SCFA 2018. Lecture Notes in Computer Science, vol 10776. Springer, Cham. pp. 243–256. doi:10.1007/978-3-319-69953-0_14.
  41. ^ Yamada, S.; Imamura, T.; Machida, M. (2022). High performance parallel LOBPCG method for large Hamiltonian derived from Hubbard model on multi-GPU systems. SupercomputingAsia (SCA).
  42. ^ Yang, C.; Meza, J. C.; Lee, B.; Wang, L.-W. (2009). "KSSOLV - a MATLAB toolbox for solving the Kohn-Sham equations". ACM Trans. Math. Softw. 36 (2): 1–35. doi:10.1145/1499096.1499099. S2CID 624897.
  43. ^ Fathurrahman, Fadjar; Agusta, Mohammad Kemal; Saputro, Adhitya Gandaryus; Dipojono, Hermawan Kresno (2020). "PWDFT.jl: A Julia package for electronic structure calculation using density functional theory and plane wave basis". Computer Physics Communications. 256: 107372. Bibcode:2020CoPhC.25607372F. doi:10.1016/j.cpc.2020.107372. S2CID 219517717.
  44. ^ Density-functional toolkit (DFTK). Plane wave density functional theory in Julia
  45. ^ Rakhuba, Maxim; Oseledets, Ivan (2016). "Calculating vibrational spectra of molecules using tensor train decomposition". J. Chem. Phys. 145 (12): 124101. arXiv:1605.08422. Bibcode:2016JChPh.145l4101R. doi:10.1063/1.4962420. PMID 27782616. S2CID 44797395.
  46. ^ Takano, Yu; Nakata, Kazuto; Yonezawa, Yasushige; Nakamura, Haruki (2016). "Development of massive multilevel molecular dynamics simulation program, platypus (PLATform for dYnamic protein unified simulation), for the elucidation of protein functions". J. Comput. Chem. 37 (12): 1125–1132. doi:10.1002/jcc.24318. PMC 4825406. PMID 26940542.
  47. ^ Shao, Meiyue; et al. (2018). "Accelerating Nuclear Configuration Interaction Calculations through a Preconditioned Block Iterative Eigensolver". Computer Physics Communications. 222 (1): 1–13. arXiv:1609.01689. Bibcode:2018CoPhC.222....1S. doi:10.1016/j.cpc.2017.09.004. S2CID 13996642.
  48. ^ Kang, Sungwoo; et al. (2020). "ACE-Molecule: An open-source real-space quantum chemistry package". The Journal of Chemical Physics. 152 (12): 124110. Bibcode:2020JChPh.152l4110K. doi:10.1063/5.0002959. PMID 32241122. S2CID 214768088.
  49. ^ Baczewski, Andrew David; Brickson, Mitchell Ian; Campbell, Quinn; Jacobson, Noah Tobias; Maurer, Leon (2020-09-01). A Quantum Analog Coprocessor for Correlated Electron Systems Simulation (Report). United States: Sandia National Lab. (SNL-NM). doi:10.2172/1671166. OSTI 1671166.
  50. ^ A Survey of Eigen Solution Methods in LS-DYNA. 15th International LS-DYNA Conference, Detroit. 2018.
  51. ^ "LS-DYNA 2024R1 (R15.0) Recent Developments" (PDF). 2024. p. 15.
  52. ^ Knyazev, A.; Malyshev, A. (2015). Accelerated graph-based spectral polynomial filters. 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), Boston, MA. pp. 1–6. arXiv:1509.02468. doi:10.1109/MLSP.2015.7324315.
  53. ^ Knyazev, Andrew V. (2003). Boley; Dhillon; Ghosh; Kogan (eds.). Modern preconditioned eigensolvers for spectral image segmentation and graph bisection. Clustering Large Data Sets; Third IEEE International Conference on Data Mining (ICDM 2003) Melbourne, Florida: IEEE Computer Society. pp. 59–62.
  54. ^ Knyazev, Andrew V. (2006). Multiscale Spectral Image Segmentation Multiscale preconditioning for computing eigenvalues of graph Laplacians in image segmentation. Fast Manifold Learning Workshop, WM Williamburg, VA. doi:10.13140/RG.2.2.35280.02565.
  55. ^ Knyazev, Andrew V. (2006). Multiscale Spectral Graph Partitioning and Image Segmentation. Workshop on Algorithms for Modern Massive Datasets Stanford University and Yahoo! Research.
  56. ^ "Spectral Clustering — scikit-learn documentation".
[edit]