Root-finding algorithm: Difference between revisions
Trojan.chess (talk | contribs) |
|||
(81 intermediate revisions by 39 users not shown) | |||
Line 1: | Line 1: | ||
{{ |
{{Short description|Algorithms for zeros of functions}} |
||
In [[ |
In [[numerical analysis]], a '''root-finding algorithm''' is an [[algorithm]] for finding [[Zero of a function|zeros]], also called "roots", of [[continuous function]]s. A [[zero of a function]] {{math|''f''}} is a number {{math|''x''}} such that {{math|1=''f''(''x'') = 0}}. As, generally, the zeros of a function cannot be computed exactly nor expressed in [[closed form expression|closed form]], root-finding algorithms provide approximations to zeros. For functions from the [[real number]]s to real numbers or from the [[complex number]]s to the complex numbers, these are expressed either as [[floating-point arithmetic|floating-point]] numbers without error bounds or as floating-point values together with error bounds. The latter, approximations with error bounds, are equivalent to small isolating [[interval (mathematics)|intervals]] for real roots or [[disk (mathematics)|disks]] for complex roots.<ref>{{Cite book |last1=Press |first1=W. H. |title=Numerical Recipes: The Art of Scientific Computing |last2=Teukolsky |first2=S. A. |last3=Vetterling |first3=W. T. |last4=Flannery |first4=B. P. |publisher=Cambridge University Press |year=2007 |isbn=978-0-521-88068-8 |edition=3rd |publication-place=New York |chapter=Chapter 9. Root Finding and Nonlinear Sets of Equations |chapter-url=http://apps.nrbook.com/empanel/index.html#pg=442}}</ref> |
||
[[ |
[[Equation solving|Solving an equation]] {{math|1=''f''(''x'') = ''g''(''x'')}} is the same as finding the roots of the function {{math|1=''h''(''x'') = ''f''(''x'') – ''g''(''x'')}}. Thus root-finding algorithms can be used to solve any [[equation (mathematics)|equation]] of continuous functions. However, most root-finding algorithms do not guarantee that they will find all roots of a function, and if such an algorithm does not find any root, that does not necessarily mean that no root exists. |
||
Most numerical root-finding methods |
Most numerical root-finding methods are [[Iterative method|iterative methods]], producing a [[sequence]] of numbers that ideally converges towards a root as a [[Limit of a sequence|limit]]. They require one or more ''initial guesses'' of the root as starting values, then each iteration of the algorithm produces a successively more accurate approximation to the root. Since the iteration must be stopped at some point, these methods produce an approximation to the root, not an exact solution. Many methods compute subsequent values by evaluating an auxiliary function on the preceding values. The limit is thus a [[Fixed point (mathematics)|fixed point]] of the auxiliary function, which is chosen for having the roots of the original equation as fixed points and for converging rapidly to these fixed points. |
||
The |
The behavior of general root-finding algorithms is studied in [[numerical analysis]]. However, for polynomials specifically, the study of root-finding algorithms belongs to [[computer algebra]], since algebraic properties of polynomials are fundamental for the most efficient algorithms. The efficiency and applicability of an algorithm may depend sensitively on the characteristics of the given functions. For example, many algorithms use the [[derivative]] of the input function, while others work on every [[continuous function]]. In general, numerical algorithms are not guaranteed to find all the roots of a function, so failing to find a root does not prove that there is no root. However, for [[polynomial]]s, there are specific algorithms that use algebraic properties for certifying that no root is missed and for locating the roots in separate intervals (or disks for complex roots) that are small enough to ensure the convergence of numerical methods (typically [[Newton's method]]) to the unique root within each interval (or disk). |
||
== Bracketing methods == |
== Bracketing methods == |
||
Bracketing methods determine successively smaller intervals (brackets) that contain a root. When the interval is small enough, then a root is considered found. These generally use the [[intermediate value theorem]], which asserts that if a continuous function has values of opposite signs at the end points of an interval, then the function has at least one root in the interval. Therefore, they require starting with an interval such that the function takes opposite signs at the end points of the interval. However, in the case of [[polynomial]]s there are other methods such as [[Descartes' rule of signs]], [[Budan's theorem]] and [[Sturm's theorem]] for bounding or determining the number of roots in an interval. They lead to efficient algorithms for [[real-root isolation]] of polynomials, which find all real roots with a guaranteed accuracy. |
|||
Bracketing methods determine successively smaller intervals (brackets) that contain a root. When the interval is small enough, then a root has been found. They generally use the [[intermediate value theorem]], which asserts that if a continuous function has values of opposite signs at the end points of an interval, then the function has at least one root in the interval. Therefore, they require to start with an interval such that the function takes opposite signs at the end points of the interval. However, in the case of [[polynomial]]s there are other methods ([[Descartes' rule of signs]], [[Budan's theorem]] and [[Sturm's theorem]]) for getting information on the number of roots in an interval. They lead to efficient algorithms for [[real-root isolation]] of polynomials, which ensure finding all real roots with a guaranteed accuracy. |
|||
=== Bisection method === |
=== Bisection method === |
||
The simplest root-finding algorithm is the [[bisection method]]. Let {{math|''f''}} be a [[continuous function]] |
The simplest root-finding algorithm is the [[bisection method]]. Let {{math|''f''}} be a [[continuous function]] for which one knows an interval {{math|[''a'', ''b'']}} such that {{math|''f''(''a'')}} and {{math|''f''(''b'')}} have opposite signs (a bracket). Let {{math|1=''c'' = (''a'' +''b'')/2}} be the middle of the interval (the midpoint or the point that bisects the interval). Then either {{math|''f''(''a'')}} and {{math|''f''(''c'')}}, or {{math|''f''(''c'')}} and {{math|''f''(''b'')}} have opposite signs, and one has divided by two the size of the interval. Although the bisection method is robust, it gains one and only one [[bit]] of accuracy with each iteration. Therefore, the number of function evaluations required for finding an ''ε''-approximate root is <math>\log_2\frac{b-a}{\varepsilon}</math>. Other methods, under appropriate conditions, can gain accuracy faster. |
||
=== False position (''regula falsi'') === |
=== False position (''regula falsi'') === |
||
Line 19: | Line 18: | ||
:<math>c= \frac{af(b)-bf(a)}{f(b)-f(a)}.</math> |
:<math>c= \frac{af(b)-bf(a)}{f(b)-f(a)}.</math> |
||
False position is similar to the [[secant method]], except that, instead of retaining the last two points, it makes sure to keep one point on either side of the root. |
False position is similar to the [[secant method]], except that, instead of retaining the last two points, it makes sure to keep one point on either side of the root. The false position method can be faster than the bisection method and will never diverge like the secant method. However, it may fail to converge in some naive implementations due to roundoff errors that may lead to a wrong sign for {{math|''f''(''c'')}}. Typically, this may occur if the [[derivative]] of {{mvar|f}} is large in the neighborhood of the root. |
||
=== ITP method === |
=== ITP method === |
||
The [[ITP Method|ITP method]] is the only known method to bracket the root with the same worst case guarantees of the bisection method while guaranteeing a superlinear convergence to the root of |
The [[ITP Method|ITP method]] is the only known method to bracket the root with the same worst case guarantees of the bisection method while guaranteeing a superlinear convergence to the root of smooth functions as the secant method. It is also the only known method guaranteed to outperform the bisection method on the average for any continuous distribution on the location of the root (see [[ITP Method#Analysis]]). It does so by keeping track of both the bracketing interval as well as the minmax interval in which any point therein converges as fast as the bisection method. The construction of the queried point c follows three steps: interpolation (similar to the regula falsi), truncation (adjusting the regula falsi similar to [[Regula falsi#Improvements in ''regula falsi''|Regula falsi § Improvements in ''regula falsi'']]) and then projection onto the minmax interval. The combination of these steps produces a simultaneously minmax optimal method with guarantees similar to interpolation based methods for smooth functions, and in practice will outperform both the bisection method and interpolation based methods applied to both smooth and non-smooth functions. |
||
== Interpolation == |
== Interpolation == |
||
Line 28: | Line 27: | ||
Many root-finding processes work by [[interpolation]]. This consists in using the last computed approximate values of the root for approximating the function by a [[polynomial]] of low degree, which takes the same values at these approximate roots. Then the root of the polynomial is computed and used as a new approximate value of the root of the function, and the process is iterated. |
Many root-finding processes work by [[interpolation]]. This consists in using the last computed approximate values of the root for approximating the function by a [[polynomial]] of low degree, which takes the same values at these approximate roots. Then the root of the polynomial is computed and used as a new approximate value of the root of the function, and the process is iterated. |
||
Interpolating two values yields a line: a polynomial of degree one. This is the basis of the [[secant method]]. ''Regula falsi'' is also an interpolation method that interpolates two points at a time but it differs from the secant method by using two points that are not necessarily the last two computed points. Three values define a parabolic curve: a [[quadratic function]]. This is the basis of [[Muller's method]]. |
|||
''Regula falsi'' is also an interpolation method, which differs secant method by using, for interpolating by a line, two points that are not necessarily the last two computed points. |
|||
== Iterative methods == |
== Iterative methods == |
||
Although all root-finding algorithms proceed by [[iteration]], an [[iterative method|iterative]] root-finding method generally uses a specific type of iteration, consisting of defining an auxiliary function, which is applied to the last computed approximations of a root for getting a new approximation. The iteration stops when a [[fixed-point iteration|fixed point]] |
Although all root-finding algorithms proceed by [[iteration]], an [[iterative method|iterative]] root-finding method generally uses a specific type of iteration, consisting of defining an auxiliary function, which is applied to the last computed approximations of a root for getting a new approximation. The iteration stops when a [[fixed-point iteration|fixed point]] of the auxiliary function is reached to the desired precision, i.e., when a new computed value is sufficiently close to the preceding ones. |
||
=== Newton's method (and similar derivative-based methods) === |
=== Newton's method (and similar derivative-based methods) === |
||
[[Newton's method]] assumes the function ''f'' to have a continuous [[derivative]]. Newton's method may not converge if started too far away from a root. However, when it does converge, it is faster than the bisection method |
[[Newton's method]] assumes the function ''f'' to have a continuous [[derivative]]. Newton's method may not converge if started too far away from a root. However, when it does converge, it is faster than the bisection method; its [[order of convergence]] is usually quadratic whereas the bisection method's is linear. Newton's method is also important because it readily generalizes to higher-dimensional problems. [[Householder's method]]s are a class of Newton-like methods with higher orders of convergence. The first one after Newton's method is [[Halley's method]] with cubic order of convergence. |
||
=== Secant method === |
=== Secant method === |
||
Replacing the derivative in Newton's method with a [[finite difference]], we get the [[secant method]]. This method does not require the computation (nor the existence) of a derivative, but the price is slower convergence (the order |
Replacing the derivative in Newton's method with a [[finite difference]], we get the [[secant method]]. This method does not require the computation (nor the existence) of a derivative, but the price is slower convergence (the order of convergence is the [[golden ratio]], approximately 1.62<ref>{{Cite web |last=Chanson |first=Jeffrey R. |date=October 3, 2024 |title=Order of Convergence |url=https://math.libretexts.org/Bookshelves/Applied_Mathematics/Numerical_Methods_(Chasnov)/02%3A_Root_Finding/2.04%3A_Order_of_Convergence |access-date=October 3, 2024 |website=LibreTexts Mathematics}}</ref>). A generalization of the secant method in higher dimensions is [[Broyden's method]]. |
||
=== Steffensen's method === |
=== Steffensen's method === |
||
If we use a polynomial fit to remove the quadratic part of the finite difference used in the |
If we use a polynomial fit to remove the quadratic part of the finite difference used in the secant method, so that it better approximates the derivative, we obtain [[Steffensen's method]], which has quadratic convergence, and whose behavior (both good and bad) is essentially the same as Newton's method but does not require a derivative. |
||
=== Fixed point iteration method === |
|||
We can use the [[fixed-point iteration]] to find the root of a function. Given a function <math> f(x) </math> which we have set to zero to find the root (<math> f(x)=0 </math>), we rewrite the equation in terms of <math> x </math> so that <math> f(x)=0 </math> becomes <math> x=g(x) </math> (note, there are often many <math> g(x) </math> functions for each <math> f(x)=0 </math> function). Next, we relabel each side of the equation as <math> x_{n+1}=g(x_{n}) </math> so that we can perform the iteration. Next, we pick a value for <math> x_{1} </math> and perform the iteration until it converges towards a root of the function. If the iteration converges, it will converge to a root. The iteration will only converge if <math> |g'(root)|<1 </math>. |
|||
As an example of converting <math> f(x)=0 </math> to <math> x=g(x) </math>, if given the function <math> f(x)=x^2+x-1 </math>, we will rewrite it as one of the following equations. |
|||
:<math> x_{n+1}=(1/x_n) - 1 </math>, |
|||
:<math> x_{n+1}=1/(x_n+1) </math>, |
|||
:<math> x_{n+1}=1-x_n^2 </math>, |
|||
:<math> x_{n+1}=x_n^2+2x_n-1 </math>, or |
|||
:<math> x_{n+1}=\pm \sqrt{1-x_n} </math>. |
|||
=== Inverse interpolation === |
=== Inverse interpolation === |
||
The appearance of complex values in interpolation methods can be avoided by interpolating the [[inverse function|inverse]] of ''f'', resulting in the [[inverse quadratic interpolation]] method. Again, convergence is asymptotically faster than the secant method, but inverse quadratic interpolation often behaves poorly when the iterates are not close to the root. |
The appearance of complex values in interpolation methods can be avoided by interpolating the [[inverse function|inverse]] of ''f'', resulting in the [[inverse quadratic interpolation]] method. Again, convergence is asymptotically faster than the secant method, but inverse quadratic interpolation often behaves poorly when the iterates are not close to the root. |
||
==Combinations of methods== |
== Combinations of methods == |
||
=== Brent's method === |
=== Brent's method === |
||
Line 55: | Line 62: | ||
[[Ridders' method]] is a hybrid method that uses the value of function at the midpoint of the interval to perform an exponential interpolation to the root. This gives a fast convergence with a guaranteed convergence of at most twice the number of iterations as the bisection method. |
[[Ridders' method]] is a hybrid method that uses the value of function at the midpoint of the interval to perform an exponential interpolation to the root. This gives a fast convergence with a guaranteed convergence of at most twice the number of iterations as the bisection method. |
||
== Roots of polynomials == |
== Roots of polynomials {{anchor|Polynomials}} == |
||
{{excerpt|Polynomial root-finding algorithms}} |
|||
== Finding roots in higher dimensions == |
|||
Finding roots of [[polynomial]] is a long-standing problem that has been the object of much research throughout history. A testament to this is that up until the 19th century [[algebra]] meant essentially [[theory of equations|theory of polynomial equations]]. |
|||
The [[bisection method]] has been generalized to higher dimensions; these methods are called '''generalized bisection methods'''.<ref name=":0">{{Cite journal |last1=Mourrain |first1=B. |last2=Vrahatis |first2=M. N. |last3=Yakoubsohn |first3=J. C. |date=2002-06-01 |title=On the Complexity of Isolating Real Roots and Computing with Certainty the Topological Degree |journal=Journal of Complexity |language=en |volume=18 |issue=2 |pages=612–640 |doi=10.1006/jcom.2001.0636 |issn=0885-064X|doi-access=free }}</ref><ref>{{Cite book |last=Vrahatis |first=Michael N. |date=2020 |editor-last=Sergeyev |editor-first=Yaroslav D. |editor2-last=Kvasov |editor2-first=Dmitri E. |chapter=Generalizations of the Intermediate Value Theorem for Approximating Fixed Points and Zeros of Continuous Functions |chapter-url=https://link.springer.com/chapter/10.1007/978-3-030-40616-5_17 |title=Numerical Computations: Theory and Algorithms |series=Lecture Notes in Computer Science |language=en |location=Cham |publisher=Springer International Publishing |volume=11974 |pages=223–238 |doi=10.1007/978-3-030-40616-5_17 |isbn=978-3-030-40616-5 |s2cid=211160947}}</ref> At each iteration, the domain is partitioned into two parts, and the algorithm decides - based on a small number of function evaluations - which of these two parts must contain a root. In one dimension, the criterion for decision is that the function has opposite signs. The main challenge in extending the method to multiple dimensions is to find a criterion that can be computed easily and guarantees the existence of a root. |
|||
Finding the root of a [[linear polynomial]] (degree one) is easy and needs only one division. For [[quadratic polynomial]]s (degree two), the [[quadratic formula]] produces a solution, but its numerical evaluation may require some care for ensuring [[numerical stability]]. For degrees three and four, there are closed-form solutions in terms of [[radical expression|radicals]], which are generally not convenient for numerical evaluation, as being too complicated and involving the computation of several [[nth root|{{mvar|n}}th roots]] whose computation is not easier than the direct computation of the roots of the polynomial (for example the expression of the real roots of a [[cubic polynomial]] may involve non-real [[cube root]]s). For polynomials of degree five or higher [[Abel–Ruffini theorem]] asserts that there is, in general, no radical expression of the roots. |
|||
The [[Poincaré–Miranda theorem]] gives a criterion for the existence of a root in a rectangle, but it is hard to verify because it requires evaluating the function on the entire boundary of the rectangle. |
|||
So, except for very low degrees, root finding of polynomials consists of finding approximations of the roots. By the [[fundamental theorem of algebra]], one knows that a polynomial of degree {{mvar|n}} has at most {{mvar|n}} real or complex roots, and this number is reached for almost all polynomials. |
|||
Another criterion is given by a theorem of [[Leopold Kronecker|Kronecker]].<ref>{{Cite book |last1=Ortega |first1= James M. |last2=Rheinboldt |first2=Werner C. |title=Iterative solution of nonlinear equations in several variables |date=2000 |publisher=Society for Industrial and Applied Mathematics |isbn=978-0-89871-461-6 }}</ref>{{Page needed|date=December 2024}} It says that, if the [[Degree of a continuous mapping|topological degree]] of a function ''f'' on a rectangle is non-zero, then the rectangle must contain at least one root of ''f''. This criterion is the basis for several root-finding methods, such as those of Stenger<ref>{{Cite journal |last=Stenger |first=Frank |date=1975-03-01 |title=Computing the topological degree of a mapping inRn |url=https://doi.org/10.1007/BF01419526 |journal=Numerische Mathematik |language=en |volume=25 |issue=1 |pages=23–38 |doi=10.1007/BF01419526 |s2cid=122196773 |issn=0945-3245}}</ref> and Kearfott.<ref>{{Cite journal |last=Kearfott |first=Baker |date=1979-06-01 |title=An efficient degree-computation method for a generalized method of bisection |url=https://doi.org/10.1007/BF01404868 |journal=Numerische Mathematik |volume=32 |issue=2 |pages=109–127 |doi=10.1007/BF01404868 |s2cid=122058552 |issn=0029-599X}}</ref> However, computing the topological degree can be time-consuming. |
|||
It follows that the problem of root finding for polynomials may be split in three different subproblems; |
|||
* Finding one root |
|||
* Finding all roots |
|||
* Finding roots in a specific region of the [[complex plane]], typically the real roots or the real roots in a given interval (for example, when roots represents a physical quantity, only the real positive ones are interesting). |
|||
A third criterion is based on a ''characteristic polyhedron''. This criterion is used by a method called Characteristic Bisection.<ref name=":0" />{{Rp|page=19--}} It does not require computing the topological degree; it only requires computing the signs of function values. The number of required evaluations is at least <math>\log_2(D/\epsilon)</math>, where ''D'' is the length of the longest edge of the characteristic polyhedron.<ref name=":2">{{Cite journal |last1=Vrahatis |first1=M. N. |last2=Iordanidis |first2=K. I. |date=1986-03-01 |title=A Rapid Generalized Method of Bisection for Solving Systems of Non-linear Equations |url=https://doi.org/10.1007/BF01389620 |journal=Numerische Mathematik |language=en |volume=49 |issue=2 |pages=123–138 |doi=10.1007/BF01389620 |issn=0945-3245 |s2cid=121771945}}</ref>{{Rp|page=11|location=Lemma.4.7}} Note that Vrahatis and Iordanidis <ref name=":2" /> prove a lower bound on the number of evaluations, and not an upper bound. |
|||
For finding one root, [[Newton's method]] and other general [[iterative method]]s work generally well. |
|||
A fourth method uses an [[intermediate value theorem]] on simplices.<ref>{{Cite journal |last=Vrahatis |first=Michael N. |date=2020-04-15 |title=Intermediate value theorem for simplices for simplicial approximation of fixed points and zeros |journal=Topology and Its Applications |language=en |volume=275 |pages=107036 |doi=10.1016/j.topol.2019.107036 |s2cid=213249321 |issn=0166-8641|doi-access=free }}</ref> Again, no upper bound on the number of queries is given. |
|||
For finding all the roots, the oldest method is, when a root {{mvar|r}} has been found, to divide the polynomial by {{math|''x'' – ''r''}}, and restart iteratively the search of a root of the quotient polynomial. However, except for low degrees, this does not work well because of the [[numerical instability]]: [[Wilkinson's polynomial]] shows that a very small modification of one coefficient may change dramatically not only the value of the roots, but also their nature (real or complex). Also, even with a good approximation, when one evaluates a polynomial at an approximate root, one may get a result that is far to be close to zero. For example, if a polynomial of degree 20 (the degree of Wilkinson's polynomial) has a root close to 10, the derivative of the polynomial at the root may be of the order of <math>10^{20};</math> this implies that an error of <math>10^{-10}</math> on the value of the root may produce a value of the polynomial at the approximate root that is of the order of <math>10^{10}.</math> |
|||
== See also == |
|||
For avoiding these problems, methods have been elaborated, which compute all roots simultaneously, to any desired accuracy. Presently the most efficient method is [[Aberth method]]. A [[free software|free]] implementation is available under the name of [[MPSolve]]. This is a reference implementation, which can find routinely the roots of polynomials of degree larger than 1,000, with more than 1,000 significant decimal digits. |
|||
{{cols}} |
|||
The methods for computing all roots may be used for computing real roots. However, it may be difficult to decide whether a root with a small imaginary part is real or not. Moreover, as the number of the real roots is, on the average, the logarithm of the degree, it is a waste of computer resources to compute the non-real roots when one is interested in real roots. |
|||
The oldest method for computing the number of real roots, and the number of roots in an interval results from [[Sturm's theorem]], but the methods based on [[Descartes' rule of signs]] and its extensions—[[Budan's theorem|Budan's]] and [[Vincent's theorem]]s—are generally more efficient. For root finding, all proceed by reducing the size of the intervals in which roots are searched until getting intervals containing zero or one root. Then the intervals containing one root may be further reduced for getting a quadratic convergence of [[Newton's method]] to the isolated roots. The main [[computer algebra system]]s ([[Maple (software)|Maple]], [[Mathematica]], [[SageMath]], [[PARI/GP]]) have each a variant of this method as the default algorithm for the real roots of a polynomial. |
|||
===Finding one root=== |
|||
The most widely used method for computing a root is [[Newton's method]], which consists of the iterations of the computation of |
|||
:<math>x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)},</math> |
|||
by starting from a well-chosen value <math>x_0.</math> |
|||
If {{mvar|f}} is a polynomial, the computation is faster when using [[Horner rule]] for computing the polynomial and its derivative. |
|||
The convergence is generally [[quadratic convergence|quadratic]], it may converge much slowly or even not converge at all. In particular, if the polynomial has no real root, and <math>x_0</math> is real, then Newton's method cannot converge. However, if the polynomial has a real root, which is larger than the larger real root of its derivative, then Newton's method converges quadratically to this largest root if <math>x_0</math> is larger that this larger root (there are easy ways for computing an upper bound of the roots, see [[Properties of polynomial roots]]). This is the starting point of [[Horner method]] for computing the roots. |
|||
When one root {{mvar|r}} has been found, one may use [[Euclidean division of polynomials|Euclidean division]] for removing the factor {{math|''x'' – ''r''}} from the polynomial. Computing a root of the resulting quotient, and repeating the process provides, in principle, a way for computing all roots. However, this iterative scheme is numerically unstable; the approximation errors accumulate during the successive factorizations, so that the last roots are determined with a polynomial that deviates widely from a factor of the original polynomial. To reduce this error, one may, for each root that is found, restart Newton's method with the original polynomial, and this approximate root as starting value. |
|||
However, there is no warranty that this will allow finding all roots. In fact, the problem of finding the roots of a polynomial from its coefficients is in general highly [[ill-conditioned]]. This is illustrated by |
|||
[[Wilkinson's polynomial]]: the roots of this polynomial of degree 20 are the 20 first positive integers; changing the last bit of the 32-bit representation of one of its coefficient (equal to –210) produces a polynomial with only 10 real roots and 10 complex roots with imaginary parts larger than 0.6. |
|||
Closely related to Newton's method are [[Halley's method]] and [[Laguerre's method]]. Both use the polynomial and its two first derivations for an iterative process that has a [[cubic convergence]]. Combining two consecutive steps of these methods into a single test, one gets a [[rate of convergence]] of 9, at the cost of 6 polynomial evaluations (with Horner rule). On the other hand, combining three steps of Newtons method gives a rate of convergence of 8 at the cost of the same number of polynomial evaluation. This gives a slight advantage to these methods (less clear for Laguerre's method, as a square root has to be computed at each step). |
|||
When applying these methods to polynomials with real coefficients and real starting points, Newton's and Halley's method stay inside the real number line. One has to choose complex starting points to find complex roots. In contrast, the Laguerre method with a square root in its evaluation will leave the real axis of its own accord. |
|||
Another class of methods is based on converting the problem of finding polynomial roots to the problem of finding eigenvalues of the [[companion matrix]] of the polynomial. In principle, one can use any [[eigenvalue algorithm]] to find the roots of the polynomial. However, for efficiency reasons one prefers methods that employ the structure of the matrix, that is, can be implemented in matrix-free form. Among these methods are the [[power method]], whose application to the transpose of the companion matrix is the classical [[Bernoulli's method]] to find the root of greatest modulus. The [[inverse power method]] with shifts, which finds some smallest root first, is what drives the complex (''cpoly'') variant of the [[Jenkins–Traub algorithm]] and gives it its numerical stability. Additionally, it is insensitive to multiple roots and has fast convergence with order <math>1+\varphi\approx 2.6</math> (where <math>\varphi</math> is the [[golden ratio]]) even in the presence of clustered roots. This fast convergence comes with a cost of three polynomial evaluations per step, resulting in a residual of {{math|''O''({{!}}''f''(''x''){{!}}<sup>2+3''φ''</sup>)}}, that is a slower convergence than with three steps of Newton's method. |
|||
===Finding roots in pairs=== |
|||
If the given polynomial only has real coefficients, one may wish to avoid computations with complex numbers. To that effect, one has to find quadratic factors for pairs of conjugate complex roots. The application of the [[multidimensional Newton's method]] to this task results in [[Bairstow's method]]. |
|||
The real variant of [[Jenkins–Traub algorithm]] is an improvement of this method. |
|||
===Finding all roots at once=== |
|||
The simple [[Durand–Kerner method|Durand–Kerner]] and the slightly more complicated [[Aberth method]] simultaneously find all of the roots using only simple [[complex number]] arithmetic. Accelerated algorithms for multi-point evaluation and interpolation similar to the [[fast Fourier transform]] can help speed them up for large degrees of the polynomial. It is advisable to choose an asymmetric, but evenly distributed set of initial points. The implementation of this method in the [[free software]] [[MPSolve]] is a reference for its efficiency and its accuracy. |
|||
Another method with this style is the [[Graeffe's method|Dandelin–Gräffe method]] (sometimes also ascribed to [[Nikolai Ivanovich Lobachevsky|Lobachevsky]]), which uses [[polynomial transformations]] to repeatedly and implicitly square the roots. This greatly magnifies variances in the roots. Applying [[Vieta's formulas|Viète's formulas]], one obtains easy approximations for the modulus of the roots, and with some more effort, for the roots themselves. |
|||
===Exclusion and enclosure methods=== |
|||
Several fast tests exist that tell if a segment of the real line or a region of the complex plane contains no roots. By bounding the modulus of the roots and recursively subdividing the initial region indicated by these bounds, one can isolate small regions that may contain roots and then apply other methods to locate them exactly. |
|||
All these methods involve finding the coefficients of shifted and scaled versions of the polynomial. For large degrees, [[fast Fourier transform|FFT]]-based accelerated methods become viable. |
|||
For real roots, see next sections. |
|||
The [[Lehmer–Schur algorithm]] uses the [[Lehmer–Schur algorithm#Schur–Cohn test|Schur–Cohn test]] for circles; a variant, [[Lehmer–Schur algorithm#Wilf's global bisection algorithm|Wilf's global bisection algorithm]] uses a winding number computation for rectangular regions in the complex plane. |
|||
The [[splitting circle method]] uses FFT-based polynomial transformations to find large-degree factors corresponding to clusters of roots. The precision of the factorization is maximized using a Newton-type iteration. This method is useful for finding the roots of polynomials of high degree to arbitrary precision; it has almost optimal complexity in this setting.{{citation needed|date=November 2018}} |
|||
===Real-root isolation=== |
|||
{{main|Real-root isolation}} |
|||
Finding the real roots of a polynomial with real coefficients is a problem that has received much attention since the beginning of 19th century, and is still an active domain of research. Most root-finding algorithms can find some real roots, but cannot certify having found all the roots. Methods for finding all complex roots, such as [[Aberth method]] can provide the real roots. However, because of the numerical instability of polynomials (see [[Wilkinson's polynomial]]), they may need [[arbitrary-precision arithmetic]] for deciding which roots are real. Moreover, they compute all complex roots when only few are real. |
|||
It follows that the standard way of computing real roots is to compute first disjoint intervals, called ''isolating intervals'', such that each one contains exactly one real root, and together they contain all the roots. This computation is called ''real-root isolation''. Having isolating interval, one may use fast numerical methods, such as [[Newton's method]] for improving the precision of the result. |
|||
The oldest complete algorithm for real-root isolation results from [[Sturm's theorem]]. However, it appears to be much less efficient than the methods based on [[Descartes' rule of signs]] and [[Vincent's theorem]]. These methods divide into two main classes, one using [[continued fraction]]s and the other using bisection. Both method have been dramatically improved since the beginning of 21st century. With these improvements they reach a [[computational complexity]] that is similar to that of the best algorithms for computing all the roots (even when all roots are real). |
|||
These algorithms have been implemented and are available in [[Mathematica]] (continued fraction method) and [[Maple (software)|Maple]] (bisection method). Both implementations can routinely find the real roots of polynomials of degree higher than 1,000. |
|||
===Finding multiple roots of polynomials=== |
|||
{{main|Square-free factorization}} |
|||
Most root-finding algorithms behave badly when there are [[Multiplicity (mathematics)|multiple roots]] or very close roots. However, for polynomials whose coefficients are exactly given as [[integer]]s or [[rational number]]s, there is an efficient method to factorize them into factors that have only simple roots and whose coefficients are also exactly given. This method, called '''[[square-free factorization]]''', is based on the multiple roots of a polynomial being the roots of the [[polynomial greatest common divisor|greatest common divisor]] of the polynomial and its derivative. |
|||
The square-free factorization of a polynomial ''p'' is a factorization <math>p=p_1p_2^2\cdots p_k^k </math> where each <math>p_i</math> is either 1 or a polynomial without multiple roots, and two different <math>p_i</math> do not have any common root. |
|||
An efficient method to compute this factorization is [[Square-free factorization#Yun's algorithm|Yun's algorithm]]. |
|||
==See also== |
|||
* [[List of root finding algorithms]] |
* [[List of root finding algorithms]] |
||
* [[Fixed-point computation]] |
|||
* {{annotated link|Broyden's method}} |
|||
{{annotated link|Broyden's method}} |
|||
* {{annotated link|Cryptographically secure pseudorandom number generator}} |
* {{annotated link|Cryptographically secure pseudorandom number generator}} |
||
* [[GNU Scientific Library]] |
* [[GNU Scientific Library]] |
||
Line 150: | Line 91: | ||
* {{annotated link|System of polynomial equations}} |
* {{annotated link|System of polynomial equations}} |
||
* {{annotated link|Kantorovich theorem}} |
* {{annotated link|Kantorovich theorem}} |
||
{{colend}} |
|||
== References == |
|||
{{reflist}} |
|||
== Further reading == |
|||
* Victor Yakovlevich Pan: "Solving a Polynomial Equation: Some History and Recent Progress", SIAM Review, Vol.39, No.2, pp.187-220 (June, 1997). |
|||
* John Michael McNamee: ''Numerical Methods for Roots of Polynomials - Part I'', Elsevier, ISBN 978-0-444-52729-5 (2007). |
|||
* John Michael McNamee and Victor Yakovlevich Pan: ''Numerical Methods for Roots of Polynomials - Part II'', Elsevier, ISBN 978-0-444-52730-1 (2013). |
|||
==References== |
|||
{{refbegin}} |
|||
*{{Cite book |last1=Press |first1=W. H. |last2=Teukolsky |first2=S. A. |last3=Vetterling |first3=W. T. |last4=Flannery |first4=B. P. |year=2007 |title=Numerical Recipes: The Art of Scientific Computing |edition=3rd |publisher=Cambridge University Press |publication-place=New York |isbn=978-0-521-88068-8 |chapter=Chapter 9. Root Finding and Nonlinear Sets of Equations |chapter-url=http://apps.nrbook.com/empanel/index.html#pg=442}} |
|||
{{refend}} |
|||
{{Root-finding algorithms}} |
{{Root-finding algorithms}} |
||
{{Data structures and algorithms}} |
|||
[[Category:Root-finding algorithms| ]] |
[[Category:Root-finding algorithms| ]] |
Latest revision as of 07:53, 24 December 2024
In numerical analysis, a root-finding algorithm is an algorithm for finding zeros, also called "roots", of continuous functions. A zero of a function f is a number x such that f(x) = 0. As, generally, the zeros of a function cannot be computed exactly nor expressed in closed form, root-finding algorithms provide approximations to zeros. For functions from the real numbers to real numbers or from the complex numbers to the complex numbers, these are expressed either as floating-point numbers without error bounds or as floating-point values together with error bounds. The latter, approximations with error bounds, are equivalent to small isolating intervals for real roots or disks for complex roots.[1]
Solving an equation f(x) = g(x) is the same as finding the roots of the function h(x) = f(x) – g(x). Thus root-finding algorithms can be used to solve any equation of continuous functions. However, most root-finding algorithms do not guarantee that they will find all roots of a function, and if such an algorithm does not find any root, that does not necessarily mean that no root exists.
Most numerical root-finding methods are iterative methods, producing a sequence of numbers that ideally converges towards a root as a limit. They require one or more initial guesses of the root as starting values, then each iteration of the algorithm produces a successively more accurate approximation to the root. Since the iteration must be stopped at some point, these methods produce an approximation to the root, not an exact solution. Many methods compute subsequent values by evaluating an auxiliary function on the preceding values. The limit is thus a fixed point of the auxiliary function, which is chosen for having the roots of the original equation as fixed points and for converging rapidly to these fixed points.
The behavior of general root-finding algorithms is studied in numerical analysis. However, for polynomials specifically, the study of root-finding algorithms belongs to computer algebra, since algebraic properties of polynomials are fundamental for the most efficient algorithms. The efficiency and applicability of an algorithm may depend sensitively on the characteristics of the given functions. For example, many algorithms use the derivative of the input function, while others work on every continuous function. In general, numerical algorithms are not guaranteed to find all the roots of a function, so failing to find a root does not prove that there is no root. However, for polynomials, there are specific algorithms that use algebraic properties for certifying that no root is missed and for locating the roots in separate intervals (or disks for complex roots) that are small enough to ensure the convergence of numerical methods (typically Newton's method) to the unique root within each interval (or disk).
Bracketing methods
[edit]Bracketing methods determine successively smaller intervals (brackets) that contain a root. When the interval is small enough, then a root is considered found. These generally use the intermediate value theorem, which asserts that if a continuous function has values of opposite signs at the end points of an interval, then the function has at least one root in the interval. Therefore, they require starting with an interval such that the function takes opposite signs at the end points of the interval. However, in the case of polynomials there are other methods such as Descartes' rule of signs, Budan's theorem and Sturm's theorem for bounding or determining the number of roots in an interval. They lead to efficient algorithms for real-root isolation of polynomials, which find all real roots with a guaranteed accuracy.
Bisection method
[edit]The simplest root-finding algorithm is the bisection method. Let f be a continuous function for which one knows an interval [a, b] such that f(a) and f(b) have opposite signs (a bracket). Let c = (a +b)/2 be the middle of the interval (the midpoint or the point that bisects the interval). Then either f(a) and f(c), or f(c) and f(b) have opposite signs, and one has divided by two the size of the interval. Although the bisection method is robust, it gains one and only one bit of accuracy with each iteration. Therefore, the number of function evaluations required for finding an ε-approximate root is . Other methods, under appropriate conditions, can gain accuracy faster.
False position (regula falsi)
[edit]The false position method, also called the regula falsi method, is similar to the bisection method, but instead of using bisection search's middle of the interval it uses the x-intercept of the line that connects the plotted function values at the endpoints of the interval, that is
False position is similar to the secant method, except that, instead of retaining the last two points, it makes sure to keep one point on either side of the root. The false position method can be faster than the bisection method and will never diverge like the secant method. However, it may fail to converge in some naive implementations due to roundoff errors that may lead to a wrong sign for f(c). Typically, this may occur if the derivative of f is large in the neighborhood of the root.
ITP method
[edit]The ITP method is the only known method to bracket the root with the same worst case guarantees of the bisection method while guaranteeing a superlinear convergence to the root of smooth functions as the secant method. It is also the only known method guaranteed to outperform the bisection method on the average for any continuous distribution on the location of the root (see ITP Method#Analysis). It does so by keeping track of both the bracketing interval as well as the minmax interval in which any point therein converges as fast as the bisection method. The construction of the queried point c follows three steps: interpolation (similar to the regula falsi), truncation (adjusting the regula falsi similar to Regula falsi § Improvements in regula falsi) and then projection onto the minmax interval. The combination of these steps produces a simultaneously minmax optimal method with guarantees similar to interpolation based methods for smooth functions, and in practice will outperform both the bisection method and interpolation based methods applied to both smooth and non-smooth functions.
Interpolation
[edit]Many root-finding processes work by interpolation. This consists in using the last computed approximate values of the root for approximating the function by a polynomial of low degree, which takes the same values at these approximate roots. Then the root of the polynomial is computed and used as a new approximate value of the root of the function, and the process is iterated.
Interpolating two values yields a line: a polynomial of degree one. This is the basis of the secant method. Regula falsi is also an interpolation method that interpolates two points at a time but it differs from the secant method by using two points that are not necessarily the last two computed points. Three values define a parabolic curve: a quadratic function. This is the basis of Muller's method.
Iterative methods
[edit]Although all root-finding algorithms proceed by iteration, an iterative root-finding method generally uses a specific type of iteration, consisting of defining an auxiliary function, which is applied to the last computed approximations of a root for getting a new approximation. The iteration stops when a fixed point of the auxiliary function is reached to the desired precision, i.e., when a new computed value is sufficiently close to the preceding ones.
Newton's method (and similar derivative-based methods)
[edit]Newton's method assumes the function f to have a continuous derivative. Newton's method may not converge if started too far away from a root. However, when it does converge, it is faster than the bisection method; its order of convergence is usually quadratic whereas the bisection method's is linear. Newton's method is also important because it readily generalizes to higher-dimensional problems. Householder's methods are a class of Newton-like methods with higher orders of convergence. The first one after Newton's method is Halley's method with cubic order of convergence.
Secant method
[edit]Replacing the derivative in Newton's method with a finite difference, we get the secant method. This method does not require the computation (nor the existence) of a derivative, but the price is slower convergence (the order of convergence is the golden ratio, approximately 1.62[2]). A generalization of the secant method in higher dimensions is Broyden's method.
Steffensen's method
[edit]If we use a polynomial fit to remove the quadratic part of the finite difference used in the secant method, so that it better approximates the derivative, we obtain Steffensen's method, which has quadratic convergence, and whose behavior (both good and bad) is essentially the same as Newton's method but does not require a derivative.
Fixed point iteration method
[edit]We can use the fixed-point iteration to find the root of a function. Given a function which we have set to zero to find the root (), we rewrite the equation in terms of so that becomes (note, there are often many functions for each function). Next, we relabel each side of the equation as so that we can perform the iteration. Next, we pick a value for and perform the iteration until it converges towards a root of the function. If the iteration converges, it will converge to a root. The iteration will only converge if .
As an example of converting to , if given the function , we will rewrite it as one of the following equations.
- ,
- ,
- ,
- , or
- .
Inverse interpolation
[edit]The appearance of complex values in interpolation methods can be avoided by interpolating the inverse of f, resulting in the inverse quadratic interpolation method. Again, convergence is asymptotically faster than the secant method, but inverse quadratic interpolation often behaves poorly when the iterates are not close to the root.
Combinations of methods
[edit]Brent's method
[edit]Brent's method is a combination of the bisection method, the secant method and inverse quadratic interpolation. At every iteration, Brent's method decides which method out of these three is likely to do best, and proceeds by doing a step according to that method. This gives a robust and fast method, which therefore enjoys considerable popularity.
Ridders' method
[edit]Ridders' method is a hybrid method that uses the value of function at the midpoint of the interval to perform an exponential interpolation to the root. This gives a fast convergence with a guaranteed convergence of at most twice the number of iterations as the bisection method.
Roots of polynomials
[edit]Finding roots in higher dimensions
[edit]The bisection method has been generalized to higher dimensions; these methods are called generalized bisection methods.[3][4] At each iteration, the domain is partitioned into two parts, and the algorithm decides - based on a small number of function evaluations - which of these two parts must contain a root. In one dimension, the criterion for decision is that the function has opposite signs. The main challenge in extending the method to multiple dimensions is to find a criterion that can be computed easily and guarantees the existence of a root.
The Poincaré–Miranda theorem gives a criterion for the existence of a root in a rectangle, but it is hard to verify because it requires evaluating the function on the entire boundary of the rectangle.
Another criterion is given by a theorem of Kronecker.[5][page needed] It says that, if the topological degree of a function f on a rectangle is non-zero, then the rectangle must contain at least one root of f. This criterion is the basis for several root-finding methods, such as those of Stenger[6] and Kearfott.[7] However, computing the topological degree can be time-consuming.
A third criterion is based on a characteristic polyhedron. This criterion is used by a method called Characteristic Bisection.[3]: 19-- It does not require computing the topological degree; it only requires computing the signs of function values. The number of required evaluations is at least , where D is the length of the longest edge of the characteristic polyhedron.[8]: 11, Lemma.4.7 Note that Vrahatis and Iordanidis [8] prove a lower bound on the number of evaluations, and not an upper bound.
A fourth method uses an intermediate value theorem on simplices.[9] Again, no upper bound on the number of queries is given.
See also
[edit]Broyden's method – Quasi-Newton root-finding method for the multivariable case
- Cryptographically secure pseudorandom number generator – Type of functions designed for being unsolvable by root-finding algorithms
- GNU Scientific Library
- Graeffe's method – Algorithm for finding polynomial roots
- Lill's method – Graphical method for the real roots of a polynomial
- MPSolve – Software for approximating the roots of a polynomial with arbitrarily high precision
- Multiplicity (mathematics) – Number of times an object must be counted for making true a general formula
- nth root algorithm
- System of polynomial equations – Roots of multiple multivariate polynomials
- Kantorovich theorem – About the convergence of Newton's method
References
[edit]- ^ Press, W. H.; Teukolsky, S. A.; Vetterling, W. T.; Flannery, B. P. (2007). "Chapter 9. Root Finding and Nonlinear Sets of Equations". Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. ISBN 978-0-521-88068-8.
- ^ Chanson, Jeffrey R. (October 3, 2024). "Order of Convergence". LibreTexts Mathematics. Retrieved October 3, 2024.
- ^ a b Mourrain, B.; Vrahatis, M. N.; Yakoubsohn, J. C. (2002-06-01). "On the Complexity of Isolating Real Roots and Computing with Certainty the Topological Degree". Journal of Complexity. 18 (2): 612–640. doi:10.1006/jcom.2001.0636. ISSN 0885-064X.
- ^ Vrahatis, Michael N. (2020). "Generalizations of the Intermediate Value Theorem for Approximating Fixed Points and Zeros of Continuous Functions". In Sergeyev, Yaroslav D.; Kvasov, Dmitri E. (eds.). Numerical Computations: Theory and Algorithms. Lecture Notes in Computer Science. Vol. 11974. Cham: Springer International Publishing. pp. 223–238. doi:10.1007/978-3-030-40616-5_17. ISBN 978-3-030-40616-5. S2CID 211160947.
- ^ Ortega, James M.; Rheinboldt, Werner C. (2000). Iterative solution of nonlinear equations in several variables. Society for Industrial and Applied Mathematics. ISBN 978-0-89871-461-6.
- ^ Stenger, Frank (1975-03-01). "Computing the topological degree of a mapping inRn". Numerische Mathematik. 25 (1): 23–38. doi:10.1007/BF01419526. ISSN 0945-3245. S2CID 122196773.
- ^ Kearfott, Baker (1979-06-01). "An efficient degree-computation method for a generalized method of bisection". Numerische Mathematik. 32 (2): 109–127. doi:10.1007/BF01404868. ISSN 0029-599X. S2CID 122058552.
- ^ a b Vrahatis, M. N.; Iordanidis, K. I. (1986-03-01). "A Rapid Generalized Method of Bisection for Solving Systems of Non-linear Equations". Numerische Mathematik. 49 (2): 123–138. doi:10.1007/BF01389620. ISSN 0945-3245. S2CID 121771945.
- ^ Vrahatis, Michael N. (2020-04-15). "Intermediate value theorem for simplices for simplicial approximation of fixed points and zeros". Topology and Its Applications. 275: 107036. doi:10.1016/j.topol.2019.107036. ISSN 0166-8641. S2CID 213249321.
Further reading
[edit]- Victor Yakovlevich Pan: "Solving a Polynomial Equation: Some History and Recent Progress", SIAM Review, Vol.39, No.2, pp.187-220 (June, 1997).
- John Michael McNamee: Numerical Methods for Roots of Polynomials - Part I, Elsevier, ISBN 978-0-444-52729-5 (2007).
- John Michael McNamee and Victor Yakovlevich Pan: Numerical Methods for Roots of Polynomials - Part II, Elsevier, ISBN 978-0-444-52730-1 (2013).