Line search: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 01:59, 11 August 2024

In optimization, line search is a basic iterative approach to find a local minimum $\mathbf {x} ^{*}$ of an objective function $f:\mathbb {R} ^{n}\to \mathbb {R}$ . It first finds a descent direction along which the objective function $f$ will be reduced, and then computes a step size that determines how far $\mathbf {x}$ should move along that direction. The descent direction can be computed by various methods, such as gradient descent or quasi-Newton method. The step size can be determined either exactly or inexactly.

One-dimensional line search

Suppose f is a one-dimensional function, $f:\mathbb {R} \to \mathbb {R}$ , and assume that it is unimodal, that is, contains exactly one local minimum x* in a given interval [a,z]. This means that f is strictly decreasing in [a,x*] and strictly increasing in [x*,z]. There are several ways to find an (approximate) minimum point in this case.^[1]^: sec.5

Zero-order methods

Zero-order methods use only function evaluations (i.e., a value oracle) - not derivatives:^[1]^: sec.5

Ternary search: pick some two points b,c such that a<b<c<z. If f(b)≤f(c), then x* must be in [a,c]; if f(b)≥f(c), then x* must be in [b,z]. In both cases, we can replace the search interval with a smaller one. If we pick b,c very close to the interval center, then the interval shrinks by ~1/2 at each iteration, but we need two function evaluations per iteration. Therefore, the method has linear convergence with rate ${\sqrt {0.5}}\approx 0.71$ . If we pick b,c such that the partition a,b,c,z has three equal-length intervals, then the interval shrinks by 2/3 at each iteration, so the method has linear convergence with rate ${\sqrt {2/3}}\approx 0.82$ .
Fibonacci search: This is a variant of ternary search in which the points b,c are selected based on the Fibonacci sequence. At each iteration, only one function evaluation is needed, since the other point was already an endpoint of a previous interval. Therefore, the method has linear convergence with rate $1/\varphi \approx 0.618$ .
Golden-section search: This is a variant in which the points b,c are selected based on the golden ratio. Again, only one function evaluation is needed in each iteration, and the method has linear convergence with rate $1/\varphi \approx 0.618$ . This ratio is optimal among the zero-order methods.

Zero-order methods are very general - they do not assume differentiability or even continuity.

First-order methods

First-order methods assume that f is continuously differentiable, and that we can evaluate not only f but also its derivative.^[1]^: sec.5

The bisection method computes the derivative of f at the center of the interval, c: if f'(c)=0, then this is the minimum point; if f'(c)>0, then the minimum must be in [a,c]; if f'(c)<0, then the minimum must be in [c,z]. This method has linear convergence with rate 0.5.

Curve-fitting methods

Curve-fitting methods try to attain superlinear convergence by assuming that f has some analytic form, e.g. a polynomial of finite degree. At each iteration, there is a set of "working points" in which we know the value of f (and possibly also its derivative). Based on these points, we can compute a polynomial that fits the known values, and find its minimum analytically. The minimum point becomes a new working point, and we proceed to the next iteration:^[1]^: sec.5

Newton's method is a special case of a curve-fitting method, in which the curve is a degree-two polynomial, constructed using the first and second derivatives of f. If the method is started close enough to a non-degenerate local minimum (= with a positive second derivative), then it has quadratic convergence.
Regula falsi is another method that fits the function to a degree-two polynomial, but it uses the first derivative at two points, rather than the first and second derivative at the same point. If the method is started close enough to a non-degenerate local minimum, then it has superlinear convergence of order $\varphi \approx 1.618$ .
Cubic fit fits to a degree-three polynomial, using both the function values and its derivative at the last two points. If the method is started close enough to a non-degenerate local minimum, then it has quadratic convergence.

Curve-fitting methods have superlinear convergence when started close enough to the local minimum, but might diverge otherwise. Safeguarded curve-fitting methods simultaneously execute a linear-convergence method in parallel to the curve-fitting method. They check in each iteration whether the point found by the curve-fitting method is close enough to the interval maintained by safeguard method; if it is not, then the safeguard method is used to compute the next iterate.^[1]^: 5.2.3.4

Multi-dimensional line search

In general, we have a multi-dimensional objective function $f:\mathbb {R} ^{n}\to \mathbb {R}$ . The line-search method first finds a descent direction along which the objective function $f$ will be reduced, and then computes a step size that determines how far $\mathbf {x}$ should move along that direction. The descent direction can be computed by various methods, such as gradient descent or quasi-Newton method. The step size can be determined either exactly or inexactly. Here is an example gradient method that uses a line search in step 5:

Set iteration counter $k=0$ and make an initial guess $\mathbf {x} _{0}$ for the minimum. Pick $\epsilon$ a tolerance.
Loop:
1. Compute a descent direction $\mathbf {p} _{k}$ .
2. Define a one-dimensional function $h(\alpha _{k})=f(\mathbf {x} _{k}+\alpha _{k}\mathbf {p} _{k})$ , representing the function value on the descent direction given the step-size.
3. Find an $\displaystyle \alpha _{k}$ that minimizes $h$ over $\alpha _{k}\in \mathbb {R} _{+}$ .
4. Update $\mathbf {x} _{k+1}=\mathbf {x} _{k}+\alpha _{k}\mathbf {p} _{k}$ , and ${\textstyle k=k+1}$
Until $\|\nabla f(\mathbf {x} _{k+1})\|<\epsilon$

At the line search step (2.3), the algorithm may minimize h exactly, by solving $h'(\alpha _{k})=0$ , or approximately, by using one of the one-dimensional line-search methods mentioned above. It can also be solved loosely, by asking for a sufficient decrease in h that does not necessarily approximate the optimum. One example of the former is conjugate gradient method. The latter is called inexact line search and may be performed in a number of ways, such as a backtracking line search or using the Wolfe conditions.

Overcoming local minima

Like other optimization methods, line search may be combined with simulated annealing to allow it to jump over some local minima.

References

^ ^a ^b ^c ^d ^e Nemirovsky and Ben-Tal (2023). "Optimization III: Convex Optimization" (PDF).

@@ Line 1: / Line 1: @@
+{{Short description|Optimization algorithm}}
-In (unconstrained) [[optimization (mathematics)|optimization]], the \'\'\'linesearch\'\'\' strategy is one of two basic [[iteration|iterative]] approaches to finding a [[maxima and minima|local minimum]] <math>\\mathbf{x}^*</math> of an [[objective function]] <math>f:\\mathbb R^n\\to\\mathbb R</math>. The other method is that of [[trust region]]s.
+{{Distinguish|linear search}}
+In [[optimization (mathematics)|optimization]], '''line search''' is a basic [[iteration|iterative]] approach to find a [[maxima and minima|local minimum]] <math>\mathbf{x}^*</math> of an [[objective function]] <math>f:\mathbb R^n\to\mathbb R</math>. It first finds a [[descent direction]] along which the objective function <math>f</math> will be reduced, and then computes a step size that determines how far <math>\mathbf{x}</math> should move along that direction. The descent direction can be computed by various methods, such as [[gradient descent]] or [[quasi-Newton method]]. The step size can be determined either exactly or inexactly.
-==Algorithm==
+== One-dimensional line search ==
-:i) Set iteration counter <math>k=0</math>, and make an initial guess, <math>\\mathbf{x}_0</math> for the minimum.
+Suppose ''f'' is a one-dimensional function, <math>f:\mathbb R\to\mathbb R</math>, and assume that it is [[unimodal]], that is, contains exactly one local minimum ''x''* in a given interval [''a'',''z'']. This means that ''f'' is strictly decreasing in [a,x*] and strictly increasing in [x*,''z'']. There are several ways to find an (approximate) minimum point in this case.<ref name=":0">{{Cite web |last=Nemirovsky and Ben-Tal |date=2023 |title=Optimization III: Convex Optimization |url=http://www2.isye.gatech.edu/~nemirovs/OPTIIILN2023Spring.pdf}}</ref>{{Rp|location=sec.5}}
-:ii) Compute a [[descent direction]] <math>\\mathbf{p}_k</math>.
-:iii) Choose <math>\\alpha_k</math> to \'loosely\' minimize <math>\\phi(\\alpha)=f(\\mathbf{x}_k+\\alpha\\mathbf{p}_k)</math> over <math>\\alpha\\in\\mathbb R</math>.
-:iv) Update <math>\\mathbf{x}_{k+1}=\\mathbf{x}_k+\\alpha_k\\mathbf{p}_k</math>, <math>k=k+1</math>.
-::If <math>\\|\\nabla f(\\mathbf{x}_k)\\|\\leq</math>tolerance, STOP.
-::Else, goto ii).
+=== Zero-order methods ===
-In step iii) we can either \'\'exactly\'\' minimize <math>\\phi</math>, by solving <math>\\phi\'(\\alpha_k)=0</math>, or \'\'loosely\'\', by asking for a sufficient decrease in <math>\\phi</math>. The latter may be performed in a number of ways, perhaps by doing a [[backtracking linesearch]] or using the [[Wolfe conditions]].
+Zero-order methods use only function evaluations (i.e., a [[value oracle]]) - not derivatives:<ref name=":0" />{{Rp|location=sec.5}}
+* [[Ternary search]]: pick some two points ''b,c'' such that ''a''<''b''<''c''<''z''. If f(''b'')≤f(''c''), then x* must be in [''a'',''c'']; if f(''b'')≥f(''c''), then x* must be in [''b'',''z'']. In both cases, we can replace the search interval with a smaller one. If we pick ''b'',''c'' very close to the interval center, then the interval shrinks by ~1/2 at each iteration, but we need two function evaluations per iteration. Therefore, the method has [[linear convergence]] with rate <math>\sqrt{0.5}\approx 0.71</math>. If we pick b,c such that the partition a,b,c,z has three equal-length intervals, then the interval shrinks by 2/3 at each iteration, so the method has [[linear convergence]] with rate <math>\sqrt{2/3}\approx 0.82</math>.
+* Fibonacci search: This is a variant of ternary search in which the points ''b'',''c'' are selected based on the [[Fibonacci sequence]]. At each iteration, only one function evaluation is needed, since the other point was already an endpoint of a previous interval. Therefore, the method has linear convergence with rate <math>1/ \varphi  \approx 0.618</math> .
+* [[Golden-section search]]: This is a variant in which the points ''b'',''c'' are selected based on the [[golden ratio]]. Again, only one function evaluation is needed in each iteration, and the method has linear convergence with rate <math>1/ \varphi  \approx 0.618</math> . This ratio is optimal among the zero-order methods.
+Zero-order methods are very general - they do not assume differentiability or even continuity.
+=== First-order methods ===
-Like other optimization methods, line search may be combined with [[simulated annealing]] to allow it to jump over some local [[minima]].
+First-order methods assume that ''f'' is continuously differentiable, and that we can evaluate not only ''f'' but also its derivative.<ref name=":0" />{{Rp|location=sec.5}}
+* The [[bisection method]] computes the derivative of ''f'' at the center of the interval, ''c'': if f'(c)=0, then this is the minimum point; if f'(''c'')>0, then the minimum must be in [''a'',''c'']; if f'(''c'')<0, then the minimum must be in [''c'',''z'']. This method has linear convergence with rate 0.5.
+=== Curve-fitting methods ===
+Curve-fitting methods try to attain [[superlinear convergence]] by assuming that ''f'' has some analytic form, e.g. a polynomial of finite degree. At each iteration, there is a set of "working points" in which we know the value of ''f'' (and possibly also its derivative). Based on these points, we can compute a polynomial that fits the known values, and find its minimum analytically. The minimum point becomes a new working point, and we proceed to the next iteration:<ref name=":0" />{{Rp|location=sec.5}}
+* [[Newton's method in optimization|Newton's method]] is a special case of a curve-fitting method, in which the curve is a degree-two polynomial, constructed using the first and second derivatives of ''f''. If the method is started close enough to a non-degenerate local minimum (= with a positive second derivative), then it has [[quadratic convergence]].
+* [[Regula falsi]] is another method that fits the function to a degree-two polynomial, but it uses the first derivative at two points, rather than the first and second derivative at the same point. If the method is started close enough to a non-degenerate local minimum, then it has superlinear convergence of order <math>\varphi  \approx 1.618</math>.
+* ''Cubic fit'' fits to a degree-three polynomial, using both the function values and its derivative at the last two points. If the method is started close enough to a non-degenerate local minimum, then it has [[quadratic convergence]].
+Curve-fitting methods have superlinear convergence when started close enough to the local minimum, but might diverge otherwise. ''Safeguarded curve-fitting methods'' simultaneously execute a linear-convergence method in parallel to the curve-fitting method. They check in each iteration whether the point found by the curve-fitting method is close enough to the interval maintained by safeguard method; if it is not, then the safeguard method is used to compute the next iterate.<ref name=":0" />{{Rp|location=5.2.3.4}}
+== Multi-dimensional line search ==
+In general, we have a multi-dimensional [[objective function]] <math>f:\mathbb R^n\to\mathbb R</math>. The line-search method first finds a [[descent direction]] along which the objective function <math>f</math> will be reduced, and then computes a step size that determines how far <math>\mathbf{x}</math> should move along that direction. The descent direction can be computed by various methods, such as [[gradient descent]] or [[quasi-Newton method]]. The step size can be determined either exactly or inexactly. Here is an example gradient method that uses a line search in step 5:
+# Set iteration counter <math>k=0</math> and make an initial guess <math>\mathbf{x}_0</math> for the minimum. Pick <math>\epsilon</math> a tolerance.
+# Loop:
+## Compute a [[descent direction]] <math>\mathbf{p}_k</math>.
+## Define a one-dimensional function <math>h(\alpha_k)=f(\mathbf{x}_k+\alpha_k\mathbf{p}_k)</math>, representing the function value on the descent direction given the step-size.
+## Find an <math>\displaystyle \alpha_k</math> that minimizes <math>h</math> over <math>\alpha_k\in\mathbb R_+</math>.
+## Update <math>\mathbf{x}_{k+1}=\mathbf{x}_k+\alpha_k\mathbf{p}_k</math>, and <math display="inline"> k=k+1</math>
+# Until <math>\|\nabla f(\mathbf{x}_{k+1})\|<\epsilon</math>
+At the line search step (2.3), the algorithm may minimize ''h'' ''exactly'', by solving <math>h'(\alpha_k)=0</math>, or ''approximately'', by using one of the one-dimensional line-search methods mentioned above. It can also be solved ''loosely'', by asking for a sufficient decrease in ''h'' that does not necessarily approximate the optimum. One example of the former is [[conjugate gradient method]]. The latter is called inexact line search and may be performed in a number of ways, such as a [[backtracking line search]] or using the [[Wolfe conditions]].
+== Overcoming local minima ==
+Like other optimization methods, line search may be combined with [[simulated annealing]] to allow it to jump over some [[local minimum|local minima]].
+==See also==
+* [[Trust region]] - a dual approach for finding a local minimum: it first computes a step size, and then determines the descent direction.
+* [[Grid search]]
+* [[Learning rate]]
+* [[Pattern search (optimization)]]
+* [[Secant method]]
 ==References==
+{{Reflist}}
-* N. I. M. Gould and S. Leyffer, An introduction to algorithms for nonlinear optimization. In J. F. Blowey, A. W. Craig, and T. Shardlow, Frontiers in Numerical Analysis, pages 109-197. Springer Verlag, Berlin, 2003.
+==Further reading==
+*{{cite book |first=J. E. Jr. |last=Dennis |first2=Robert B. |last2=Schnabel |chapter=Globally Convergent Modifications of Newton's Method |title=Numerical Methods for Unconstrained Optimization and Nonlinear Equations |location=Englewood Cliffs |publisher=Prentice-Hall |year=1983 |isbn=0-13-627216-9 |pages=111–154 }}
+*{{cite book |first=Jorge |last=Nocedal |first2=Stephen J. |last2=Wright |chapter=Line Search Methods |title=Numerical Optimization |location=New York |publisher=Springer |year=1999 |isbn=0-387-98793-2 |pages=34–63 }}
+*{{cite book |first=Wenyu |last=Sun |first2=Ya-Xiang |last2=Yuan |chapter=Line Search |title=Optimization Theory and Methods: Nonlinear Programming |location=New York |publisher=Springer |year=2006 |isbn=0-387-24975-3 |pages=71–117 }}
+{{Optimization algorithms|unconstrained}}
+{{DEFAULTSORT:Line Search}}
-[[Category:Optimization]]
+[[Category:Optimization algorithms and methods]]