Quasi-Newton method: Difference between revisions

Content deleted Content added

Inline

Revision as of 18:10, 9 February 2012

In optimization, quasi-Newton methods (also known as variable metric methods) are algorithms for finding local maxima and minima of functions. Quasi-Newton methods are based on Newton's method to find the stationary point of a function, where the gradient is 0. Newton's method assumes that the function can be locally approximated as a quadratic in the region around the optimum, and use the first and second derivatives to find the stationary point.

In Quasi-Newton methods the Hessian matrix of second derivatives of the function to be minimized does not need to be computed. The Hessian is updated by analyzing successive gradient vectors instead. Quasi-Newton methods are a generalization of the secant method to find the root of the first derivative for multidimensional problems. In multi-dimensions the secant equation is under-determined, and quasi-Newton methods differ in how they constrain the solution, typically by adding a simple low-rank update to the current estimate of the Hessian.

The first quasi-Newton algorithm was proposed by W.C. Davidon, a physicist working at Argonne National Laboratory. He developed the first quasi-Newton algorithm in 1959: the DFP updating formula, which was later popularized by Fletcher and Powell in 1963, but is rarely used today. The most common quasi-Newton algorithms are currently the SR1 formula (for symmetric rank one), the BHHH method, the widespread BFGS method (suggested independently by Broyden, Fletcher, Goldfarb, and Shanno, in 1970), and its low-memory extension, L-BFGS. The Broyden's class is a linear combination of the DFP and BFGS methods.

The SR1 formula does not guarantee the update matrix to maintain positive-definiteness and can be used for indefinite problems. The Broyden's method does not require the update matrix to be symmetric and it is used to find the root of a general system of equations (rather than the gradient) by updating the Jacobian (rather than the Hessian).

Description of the method

As in Newton's method, one uses a second order approximation to find the minimum of a function $f(x)$ . The Taylor series of $f(x)$ around an iterate is:

f(x_{k}+\Delta x)\approx f(x_{k})+\nabla f(x_{k})^{T}\Delta x+{\frac {1}{2}}\Delta x^{T}{B}\,\Delta x,

where ( $\nabla f$ ) is the gradient and $B$ an approximation to the Hessian matrix. The gradient of this approximation (with respect to $\Delta x$ ) is

\nabla f(x_{k}+\Delta x)\approx \nabla f(x_{k})+B\,\Delta x

and setting this gradient to zero provides the Newton step:

\Delta x=-B^{-1}\nabla f(x_{k}),\,

The Hessian approximation $B$ is chosen to satisfy

\nabla f(x_{k}+\Delta x)=\nabla f(x_{k})+B\,\Delta x,

which is called the secant equation(the Taylor series of the gradient itself). In more than one dimension $B$ is under determined. In one dimension, solving for $B$ and applying the Newton's step with the updated value is equivalent to the secant method. Various methods are used to find the solution to the secant equation that is symmetric ( $B^{T}=B$ ) and closest to the current approximate value $B_{k}$ according to some metric $\min _{B}\|B-B_{k}\|$ . An approximate initial value of $B_{0}=I$ is often sufficient to achieve rapid convergence. The unknown $x_{k}$ is updated applying the Newton's step calculated using the current approximate Hessian matrix $B_{k}$

$\Delta x_{k}=-\alpha _{k}B_{k}^{-1}\nabla f(x_{k})$ , with $\alpha$ chosen to satisfy the Wolfe conditions;
$x_{k+1}=x_{k}+\Delta x_{k}$ ;
The gradient computed at the new point $\nabla f(x_{k+1})$ , and

y_{k}=\nabla f(x_{k+1})-\nabla f(x_{k}),

is used to update the approximate Hessian $\displaystyle B_{k+1}$ , or directly its inverse $\displaystyle H_{k+1}=B_{k+1}^{-1}$ using the Sherman-Morrison formula.

A key property of the BFGS and DFP updates is that if $B_{k}$ is positive definite and $\alpha _{k}$ is chosen to satisfy the Wolfe conditions then $\displaystyle B_{k+1}$ is also positive definite.

The most popular update formulas are:

Method	$\displaystyle B_{k+1}=$	$H_{k+1}=B_{k+1}^{-1}=$
DFP	$\left(I-{\frac {y_{k}\,\Delta x_{k}^{T}}{y_{k}^{T}\,\Delta x_{k}}}\right)B_{k}\left(I-{\frac {\Delta x_{k}y_{k}^{T}}{y_{k}^{T}\,\Delta x_{k}}}\right)+{\frac {y_{k}y_{k}^{T}}{y_{k}^{T}\,\Delta x_{k}}}$	$H_{k}+{\frac {\Delta x_{k}\Delta x_{k}^{T}}{y_{k}^{T}\,\Delta x_{k}}}-{\frac {H_{k}y_{k}y_{k}^{T}H_{k}^{T}}{y_{k}^{T}H_{k}y_{k}}}$
BFGS	$B_{k}+{\frac {y_{k}y_{k}^{T}}{y_{k}^{T}\Delta x_{k}}}-{\frac {B_{k}\Delta x_{k}(B_{k}\Delta x_{k})^{T}}{\Delta x_{k}^{T}B_{k}\,\Delta x_{k}}}$	$\left(I-{\frac {y_{k}\Delta x_{k}^{T}}{y_{k}^{T}\Delta x_{k}}}\right)^{T}H_{k}\left(I-{\frac {y_{k}\Delta x_{k}^{T}}{y_{k}^{T}\Delta x_{k}}}\right)+{\frac {\Delta x_{k}\Delta x_{k}^{T}}{y_{k}^{T}\,\Delta x_{k}}}$
Broyden	$B_{k}+{\frac {y_{k}-B_{k}\Delta x_{k}}{\Delta x_{k}^{T}\,\Delta x_{k}}}\,\Delta x_{k}^{T}$	$H_{k}+{\frac {(\Delta x_{k}-H_{k}y_{k})y_{k}^{T}H_{k}}{y_{k}^{T}H_{k}\,\Delta x_{k}}}$
Broyden family	$(1-\varphi _{k})B_{k+1}^{BFGS}+\varphi _{k}B_{k+1}^{DFP},\qquad \varphi \in [0,1]$
SR1	$B_{k}+{\frac {(y_{k}-B_{k}\,\Delta x_{k})(y_{k}-B_{k}\,\Delta x_{k})^{T}}{(y_{k}-B_{k}\,\Delta x_{k})^{T}\,\Delta x_{k}}}$	$H_{k}+{\frac {(\Delta x_{k}-H_{k}y_{k})(\Delta x_{k}-H_{k}y_{k})^{T}}{(\Delta x_{k}-H_{k}y_{k})^{T}y_{k}}}$

Implementations

The NAG Library contains several routines^[1] for minimizing or maximizing a function^[2] which use quasi-Newton algorithms.

Here is a Matlab example which uses the BFGS method.

%***********************************************************************%
% Usage: [x,Iter,FunEval,EF] = Quasi_Newton (fun,x0,MaxIter,epsg,epsx)
%         fun: name of the multidimensional scalar objective function
%              (string). This function takes a vector argument of length
%              n and returns a scalar.
%          x0: starting point (row vector of length n).
%     MaxIter: maximum number of iterations to find a solution.
%        epsg: maximum acceptable Euclidean norm of the gradient of the
%              objective function at the solution found.
%        epsx: minimum relative change in the optimization variables x.
%           x: solution found (row vector of length n).
%        Iter: number of iterations needed to find the solution.
%     FunEval: number of function evaluations needed.
%          EF: exit flag,
%              EF=1: successful optimization (gradient is small enough).
%              EF=2: algorithm converged (relative change in x is small 
%                    enough).
%              EF=-1: maximum number of iterations exceeded.

%  C) Quasi-Newton optimization algorithm using (BFGS)                  %

function [x,i,FunEval,EF]= Quasi_Newton (fun, x0, MaxIter, epsg, epsx) 
%   Variable Declaration 
 xi        = zeros(MaxIter+1,size(x0,2));
 xi(1,:)   = x0;
 Bi        = eye(size(x0,2));

%  CG algorithm
FunEval = 0;
EF = 0;

  for i = 1:MaxIter

      %Calculate Gradient around current point
      [GradOfU,Eval] =  Grad (fun, xi(i,:));
      FunEval        =  FunEval + Eval;       %Update function evaluation

      %Calculate search direction 
      di             = -Bi\GradOfU ;

      %Calculate Alfa via exact line search 
      [alfa, Eval]   =  LineSearchAlfa(fun,xi(i,:),di);      
      FunEval        =  FunEval + Eval;       %Update function evaluation

      %Calculate Next solution of X    
      xi(i+1,:)      =  xi(i,:) + (alfa*di)';
      
      % Calculate Gradient of X on i+1
      [Grad_Next, Eval] =  Grad (fun, xi(i+1,:));
      FunEval           =  FunEval + Eval;       %Update function evaluation
      
      %Calculate new Beta value using BFGS algorithm            
      Bi                =  BFGS(fun,GradOfU,Grad_Next,xi(i+1,:),xi(i,:), Bi);         
                
      % Calculate maximum acceptable Euclidean norm of the gradient
      if norm(Grad_Next,2) < epsg
          EF        = 1;
          break
      end
      
      % Calculate minimum relative change in the optimization variables
      E            =   xi(i+1,:)- xi(i,:);
      if norm(E,2) < epsx
          EF       = 2;
          break
      end
  end
  % report optimum solution
   x    = xi(i+1,:);
  
  if i == MaxIter
  % report Exit flag that MaxNum of iterations reach      
     EF =  -1;
  end
  
  % report MaxNum of iterations reach  
  Iter  = i;
  
end

%***********************************************************************%
% Broyden, Fletcher, Goldfarb and Shanno (BFGS) formula
%***********************************************************************%
function  B  = BFGS(fun,GradOfU,Grad_Next,Xi_next,Xi,Bi)

 % Calculate Si term
  si               =  Xi_next   - Xi;
  
 % Calculate Yi term
  yi               =  Grad_Next - GradOfU;
 %
 % BFGS formula (Broyden, Fletcher, Goldfarb and Shanno)
 %
  B   =  Bi - ((Bi*si'*si*Bi)/(si*Bi*si')) + ((yi*yi')/(yi'*si'));

end

References

^ The Numerical Algorithms Group. "Keyword Index: Quasi-Newton". NAG Library Manual, Mark 23. Retrieved 2012-02-09.
^ The Numerical Algorithms Group. "E04 – Minimizing or Maximizing a Function" (PDF). NAG Library Manual, Mark 23. Retrieved 2012-02-09.

@@ Line 94: / Line 94: @@
 |}
+==Implementations==
-==Example algorithm using matlab==
+The [[NAG Numerical Library|NAG Library]] contains several routines<ref>{{ cite web | last = The Numerical Algorithms Group | first = | title = Keyword Index: Quasi-Newton | date = | work = NAG Library Manual, Mark 23 | url = http://www.nag.co.uk/numeric/fl/nagdoc_fl23/html/INDEXES/KWIC/quasi-newton.html | accessdate = 2012-02-09 }}</ref> for minimizing or maximizing a function<ref>{{ cite web | last = The Numerical Algorithms Group | first = | title = E04 – Minimizing or Maximizing a Function | date = | work = NAG Library Manual, Mark 23 | url = http://www.nag.co.uk/numeric/fl/nagdoc_fl23/pdf/E04/e04intro.pdf  | accessdate = 2012-02-09 }}</ref> which use quasi-Newton algorithms.
+Here is a Matlab example which uses the [[BFGS]] method.
 <source lang="matlab">
@@ Line 207: / Line 212: @@
 * [[Broyden's Method]]
-==References==
+== References ==
+{{reflist}}
+== Further reading ==
 * Bonnans, J. F., Gilbert, J.Ch., [[Claude Lemaréchal|Lemaréchal, C.]] and Sagastizábal, C.A. (2006), ''Numerical optimization, theoretical and numerical aspects.'' Second edition. Springer. ISBN 978-3-540-35445-1.
 * William C. Davidon, [http://link.aip.org/link/?SJE/1/1/1 Variable Metric Method for Minimization], SIOPT Volume 1 Issue 1, Pages 1–17, 1991.

Revision as of 18:10, 9 February 2012

Description of the method

Implementations

See also

References

Further reading