Jump to content

Hamilton–Jacobi–Bellman equation: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
Citation bot (talk | contribs)
Altered template type. Add: eprint, class, authors 1-1. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | #UCB_CommandLine
 
(223 intermediate revisions by 100 users not shown)
Line 1: Line 1:
{{Short description|An optimality condition in optimal control theory}}
'''The Hamilton-Jacobi-Bellman (HJB) equation''' is a [[partial differential equation]] which is central to [[optimal control]] theory.
The '''Hamilton-Jacobi-Bellman''' ('''HJB''') '''equation''' is a [[nonlinear partial differential equation]] that provides [[necessary and sufficient condition]]s for [[Optimal control theory|optimality]] of a [[Control (optimal control theory)|control]] with respect to a [[loss function]].<ref>{{cite book |first=Donald E. |last=Kirk |title=Optimal Control Theory: An Introduction |location=Englewood Cliffs, NJ |publisher=Prentice-Hall |year=1970 |isbn=0-13-638098-0 |pages=86–90 |url=https://books.google.com/books?id=fCh2SAtWIdwC&pg=PA86 }}</ref> Its solution is the [[value function]] of the optimal control problem which, once known, can be used to obtain the optimal control by taking the maximizer (or minimizer) of the [[Hamiltonian (control theory)|Hamiltonian]] involved in the HJB equation.<ref>{{cite book |first1=Jiongmin |last1=Yong |first2=Xun Yu |last2=Zhou |title=Stochastic Controls : Hamiltonian Systems and HJB Equations |publisher=Springer |year=1999 |isbn=0-387-98723-1 |pages=157–215 [p. 163] |chapter=Dynamic Programming and HJB Equations |chapter-url=https://books.google.com/books?id=CdHuD7E-7XIC&pg=PA163 }}</ref><ref>{{cite book |first=Desineni S. |last=Naidu |title=Optimal Control Systems |location=Boca Raton |publisher=CRC Press |year=2003 |isbn=0-8493-0892-5 |pages=277–283 [p. 280] |chapter=The Hamilton–Jacobi–Bellman Equation |chapter-url=https://books.google.com/books?id=hGxurdEZVtkC&pg=PA280 }}</ref>


The equation is a result of the theory of [[dynamic programming]] which was pioneered in the 1950s by [[Richard Bellman]] and coworkers.<ref>{{cite journal |first= R. E. |last=Bellman |title=Dynamic Programming and a new formalism in the calculus of variations |journal=[[Proceedings of the National Academy of Sciences of the United States of America|Proc. Natl. Acad. Sci.]] |volume=40 |issue=4 |year=1954 |pages=231–235 |doi= 10.1073/pnas.40.4.231|pmc=527981 |pmid=16589462|bibcode=1954PNAS...40..231B |doi-access=free }}</ref><ref>{{cite book |first=R. E. |last=Bellman |title=Dynamic Programming |location=Princeton, NJ|publisher=Princeton University Press |year=1957 }}</ref><ref>{{cite journal |first1=R. |last1=Bellman |first2=S. |last2=Dreyfus |title=An Application of Dynamic Programming to the Determination of Optimal Satellite Trajectories |journal=J. Br. Interplanet. Soc. |volume=17 |year=1959 |pages=78–83 }}</ref> The connection to the [[Hamilton–Jacobi equation]] from [[classical physics]] was first drawn by [[Rudolf E. Kálmán|Rudolf Kálmán]].<ref>{{cite book |first=Rudolf E. |last=Kálmán |chapter=The Theory of Optimal Control and the Calculus of Variations |title=Mathematical Optimization Techniques |editor-first=Richard |editor-last=Bellman |location=Berkeley |publisher=University of California Press |year=1963 |pages=309–331 |oclc=1033974 }}</ref> In [[Discrete time and continuous time|discrete-time]] problems, the analogous [[difference equation]] is usually referred to as the [[Bellman equation]].
The solution of the HJB equation is the 'value function', which gives the optimal cost-to-go for a given [[dynamical system]] with an associated cost function. Classical variational problems, for example, the [[brachistochrone problem]] can be solved using this method as well.


While classical [[variational problem]]s, such as the [[brachistochrone problem]], can be solved using the Hamilton–Jacobi–Bellman equation,<ref>{{cite book |last=Kemajou-Brown |first=Isabelle |title=Probability on Algebraic and Geometric Structures |series=Contemporary Mathematics |volume=668 |editor-first=Gregory |editor-last=Budzban |editor2-first=Harry Randolph |editor2-last=Hughes |editor3-first=Henri |editor3-last=Schurz |year=2016 |chapter=Brief History of Optimal Control Theory and Some Recent Developments |pages=119–130 |doi=10.1090/conm/668/13400 |isbn=9781470419455 }}</ref> the method can be applied to a broader spectrum of problems. Further it can be generalized to [[stochastic]] systems, in which case the HJB equation is a second-order [[elliptic partial differential equation]].<ref>{{cite book |first=Fwu-Ranq |last=Chang |title=Stochastic Optimization in Continuous Time |location=Cambridge, UK |publisher=Cambridge University Press |year=2004 |isbn=0-521-83406-6 |pages=113–168 |url=https://books.google.com/books?id=PmIefn9u67AC&pg=PA114 }}</ref> A major drawback, however, is that the HJB equation admits classical solutions only for a [[Smoothness|sufficiently smooth]] value function, which is not guaranteed in most situations. Instead, the notion of a [[viscosity solution]] is required, in which conventional derivatives are replaced by (set-valued) [[subderivative]]s.<ref>{{cite book |first1=Martino |last1=Bardi |first2=Italo |last2=Capuzzo-Dolcetta |title=Optimal Control and Viscosity Solutions of Hamilton–Jacobi–Bellman Equations |location=Boston |publisher=Birkhäuser |year=1997 |isbn=0-8176-3640-4 }}</ref>
The equation is a result of the theory of [[dynamic programming]] which was pioneered in the 1950s by [[Richard Bellman]] and coworkers. The corresponding discrete-time equation is usually referred to as the [[Bellman equation]]. In continuous time, the result can be seen as an extension of earlier work in [[classical physics]] on the [[Hamilton-Jacobi equation]] by [[William Rowan Hamilton]] and [[Carl Gustav Jacob Jacobi]].


==Optimal Control Problems==
Consider the following problem in deterministic optimal control


Consider the following problem in deterministic optimal control over the time period <math>[0,T]</math>:
:<math> \min \int_0^T C[x(t),u(t)]\,dt + D[x(T)] </math>


:<math>V(x(0), 0) = \min_u \left\{ \int_0^T C[x(t),u(t)]\,dt + D[x(T)] \right\}</math>
subject to


where <math>C[\cdot]</math> is the scalar cost rate function and <math>D[\cdot]</math> is a function that gives the [[bequest value]] at the final state, <math>x(t)</math> is the system state vector, <math>x(0)</math> is assumed given, and <math>u(t)</math> for <math> 0 \leq t \leq T</math> is the control vector that we are trying to find. Thus, <math>V(x, t)</math> is the [[value function]].
:<math> \dot{x}(t)=F[x(t),u(t)] </math>


The system must also be subject to
where <math>x(t)</math> is the system state, <math>x(0)</math> is assumed given, and <math>u(t)</math> for <math>0\leq t\leq T</math> is the control that we are trying to find.

For this simple system, the Hamilton Jacobi Bellman partial differential equation is
:<math> \dot{x}(t)=F[x(t),u(t)] \, </math>

where <math>F[\cdot]</math> gives the vector determining physical evolution of the state vector over time.

==The Partial Differential Equation==

For this simple system, the Hamilton–Jacobi–Bellman partial differential equation is


:<math>
:<math>
\frac{\partial}{\partial t} V(x,t) + \min_u \left\{ \left\langle \frac{\partial}{\partial x}V(x,t), F(x, u) \right\rangle + C(x,u) \right\} = 0
\frac{\partial V(x,t)}{\partial t} + \min_u \left\{ \frac{\partial V(x,t)}{\partial x} \cdot F(x, u) + C(x,u) \right\} = 0
</math>
</math>


Line 23: Line 31:


:<math>
:<math>
V(x,T) = D(x).\,
V(x,T) = D(x),\,
</math>
</math>


The unknown <math>V(t, x)</math> in the above PDE is the Bellman 'value function', that is the cost incurred from starting in state <math>x</math> at time <math>t</math> and controlling the system optimally from then until time <math>T</math>.
As before, the unknown scalar function <math>V(x, t)</math> in the above partial differential equation is the Bellman [[value function]], which represents the cost incurred from starting in state <math>x</math> at time <math>t</math> and controlling the system optimally from then until time <math>T</math>.
The HJB equation needs to be solved backwards in time, starting from <math>t = T</math> and ending at <math>t = 0</math>. (The notation <math>\langle a,b \rangle </math> means the inner product of the vectors a and b).


==Deriving the Equation==
The HJB equation is a [[sufficient condition]] for an optimum. If we can solve for <math>V</math> then we can find from it a control <math>u</math> that achieves the minimum cost.


Intuitively, the HJB equation can be derived as follows. If <math>V(x(t), t)</math> is the optimal cost-to-go function (also called the 'value function'), then by Richard Bellman's [[principle of optimality]], going from time ''t'' to ''t''&nbsp;+&nbsp;''dt'', we have
The HJB method can be generalized to [[stochastic]] systems as well.

:<math> V(x(t), t) = \min_u \left\{V(x(t+dt), t+dt) + \int_t^{t + dt} C(x(s), u(s)) \, ds\right\}. </math>

Note that the [[Taylor expansion]] of the first term on the right-hand side is

:<math> V(x(t+dt), t+dt) = V(x(t), t) + \frac{\partial V(x, t)}{\partial t} \, dt + \frac{\partial V(x, t)}{\partial x} \cdot \dot{x}(t) \, dt + \mathcal{o}(dt),</math>

where <math>\mathcal{o}(dt)</math> denotes the terms in the Taylor expansion of higher order than one in [[Little-o notation|little-''o'' notation]]. Then if we subtract <math>V(x(t), t)</math> from both sides, divide by ''dt'', and take the limit as ''dt'' approaches zero, we obtain the HJB equation defined above.

==Solving the Equation==

The HJB equation is usually [[Backward induction|solved backwards in time]], starting from <math>t = T</math> and ending at <math>t = 0</math>.<ref>{{cite book |first1=Frank L. |last1=Lewis |first2=Draguna |last2=Vrabie |first3=Vassilis L. |last3=Syrmos |title=Optimal Control |edition=3rd |location= |publisher=Wiley |year=2012 |page=278 |isbn=978-0-470-63349-6 }}</ref>

When solved over the whole of state space and <math>V(x)</math> is continuously differentiable, the HJB equation is a [[necessary and sufficient condition]] for an optimum when the terminal state is unconstrained.<ref>{{cite book |first=Dimitri P. |last=Bertsekas |title=Dynamic Programming and Optimal Control |publisher=Athena Scientific |year=2005 }}</ref> If we can solve for <math>V</math> then we can find from it a control <math>u</math> that achieves the minimum cost.

In general case, the HJB equation does not have a classical (smooth) solution. Several notions of generalized solutions have been developed to cover such situations, including [[viscosity solution]] ([[Pierre-Louis Lions]] and [[Michael G. Crandall|Michael Crandall]]),<ref>{{cite book |first1=Martino |last1=Bardi |first2=Italo |last2=Capuzzo-Dolcetta |title=Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations |location=Boston |publisher=Birkhäuser |year=1997 |isbn=0-8176-3640-4 }}</ref> [[minimax solution]] ({{Interlanguage link|Andrei Izmailovich Subbotin|ru|3=Субботин,_Андрей_Измайлович}}), and others.

Approximate dynamic programming has been introduced by [[Dimitri Bertsekas|D. P. Bertsekas]] and [[John Tsitsiklis|J. N. Tsitsiklis]] with the use of [[artificial neural network]]s ([[multilayer perceptron]]s) for approximating the Bellman function in general.<ref name="NeuroDynProg">{{cite book |first1=Dimitri P. |last1=Bertsekas |first2=John N. |last2=Tsitsiklis |title=Neuro-dynamic Programming |year=1996 |publisher=Athena Scientific |isbn=978-1-886529-10-6}}</ref> This is an effective mitigation strategy for reducing the impact of dimensionality by replacing the memorization of the complete function mapping for the whole space domain with the memorization of the sole neural network parameters. In particular, for continuous-time systems, an approximate dynamic programming approach that combines both policy iterations with neural networks was introduced.<ref name="CTHJB">{{cite journal |first1=Murad |last1=Abu-Khalaf |first2=Frank L.|last2=Lewis |title=Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach|year=2005 |journal=Automatica |volume=41 | issue=5 | pages=779–791|doi=10.1016/j.automatica.2004.11.034|s2cid=14757582 }}</ref> In discrete-time, an approach to solve the HJB equation combining value iterations and neural networks was introduced.<ref name="DTHJB">{{cite journal |first1=Asma |last1=Al-Tamimi|first2=Frank L.|last2=Lewis |first3=Murad |last3=Abu-Khalaf |title=Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof|year=2008 |journal=IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics |volume= 38| issue=4 | pages=943–949 |doi= 10.1109/TSMCB.2008.926614|pmid=18632382|s2cid=14202785}}</ref>

Alternatively, it has been shown that [[sum-of-squares optimization]] can yield an approximate polynomial solution to the Hamilton–Jacobi–Bellman equation arbitrarily well with respect to the <math>L^1</math> norm.<ref>{{cite arXiv |last1=Jones |first1=Morgan |last2=Peet |first2=Matthew |title=Polynomial Approximation of Value Functions and Nonlinear Controller Design with Performance Bounds |year=2020 |class=math.OC |eprint=2010.06828 }}</ref>

==Extension to Stochastic Problems==
The idea of solving a control problem by applying Bellman's principle of optimality and then working out backwards in time an optimizing strategy can be generalized to stochastic control problems. Consider similar as above

:<math> \min_u \mathbb E \left\{ \int_0^T C(t,X_t,u_t)\,dt + D(X_T) \right\}</math>

now with <math>(X_t)_{t \in [0,T]}\,\!</math> the stochastic process to optimize and <math>(u_t)_{t \in [0,T]}\,\!</math> the steering. By first using Bellman and then expanding <math>V(X_t,t)</math> with [[Itô calculus|Itô's rule]], one finds the stochastic HJB equation

:<math>
\min_u \left\{ \mathcal{A} V(x,t) + C(t,x,u) \right\} = 0,
</math>

where <math>\mathcal{A}</math> represents the [[Infinitesimal generator (stochastic processes)|stochastic differentiation operator]], and subject to the terminal condition

:<math>
V(x,T) = D(x)\,\!.
</math>

Note that the randomness has disappeared. In this case a solution <math>V\,\!</math> of the latter does not necessarily solve the primal problem, it is a candidate only and a further verifying argument is required. This technique is widely used in Financial Mathematics to determine optimal investment strategies in the market (see for example [[Merton's portfolio problem]]).

===Application to LQG-Control===

As an example, we can look at a system with linear stochastic dynamics and quadratic cost. If the system dynamics is given by
:<math>
dx_t = (a x_t + b u_t) dt + \sigma dw_t,
</math>
and the cost accumulates at rate <math>C(x_t,u_t) = r(t) u_t^2/2 + q(t) x_t^2/2</math>, the HJB equation is given by
:<math>
-\frac{\partial V(x,t)}{\partial t} = \frac{1}{2}q(t) x^2 + \frac{\partial V(x,t)}{\partial x} a x - \frac{b^2}{2 r(t)} \left(\frac{\partial V(x,t)}{\partial x}\right)^2 + \frac{\sigma^2}{2} \frac{\partial^2 V(x,t)}{\partial x^2}.
</math>
with optimal action given by
:<math>
u_t = -\frac{b}{r(t)}\frac{\partial V(x,t)}{\partial x}
</math>
Assuming a quadratic form for the value function, we obtain the usual [[Riccati equation]] for the Hessian of the value function as is usual for [[Linear-quadratic-Gaussian control]].


==See also==
In general case, the HJB equation does not have a classical (smooth) solution. Several notions of generalized solutions have been developed to cover such situations, including [[viscosity solution]] ([[Pierre-Louis Lions]] and [[Michael Crandall]]), [[minimax solution]] ([[Andrei Izmailovich Subbotin]]), and others.
* [[Bellman equation]], discrete-time counterpart of the Hamilton–Jacobi–Bellman equation.
* [[Pontryagin's maximum principle]], necessary but not sufficient condition for optimum, by maximizing a [[Hamiltonian (control theory)|Hamiltonian]], but this has the advantage over HJB of only needing to be satisfied over the single trajectory being considered.


== References ==
==References==
{{Reflist}}


==Further reading==
* R. E. Bellman. Dynamic Programming. Princeton, NJ, 1957.
* {{cite book
|first= Dimitri P. |last=Bertsekas | author-link = Dimitri P. Bertsekas
| year = 2005
| title = Dynamic Programming and Optimal Control
| publisher = Athena Scientific
}}
* {{cite book |first=Huyên |last=Pham |chapter=The Classical PDE Approach to Dynamic Programming |title=Continuous-time Stochastic Control and Optimization with Financial Applications |publisher=Springer |year=2009 |isbn=978-3-540-89499-5 |pages=37–60 |chapter-url=https://books.google.com/books?id=xBsagiBp1SYC&pg=PA37 }}
* {{cite book |first=Robert F. |last=Stengel |chapter=Conditions for Optimality |title=Optimal Control and Estimation |location=New York |publisher=Dover |year=1994 |isbn=0-486-68200-5 |pages=201–222 |chapter-url=https://books.google.com/books?id=jDjPxqm7Lw0C&pg=PA201 }}


{{DEFAULTSORT:Hamilton-Jacobi-Bellman equation}}
[[Category:Control theory]]
[[Category:partial differential equations]]
[[Category:Partial differential equations]]
[[Category:Optimal control]]
[[Category:Dynamic programming]]
[[Category:Stochastic control]]
[[Category:William Rowan Hamilton]]

Latest revision as of 17:50, 26 April 2024

The Hamilton-Jacobi-Bellman (HJB) equation is a nonlinear partial differential equation that provides necessary and sufficient conditions for optimality of a control with respect to a loss function.[1] Its solution is the value function of the optimal control problem which, once known, can be used to obtain the optimal control by taking the maximizer (or minimizer) of the Hamiltonian involved in the HJB equation.[2][3]

The equation is a result of the theory of dynamic programming which was pioneered in the 1950s by Richard Bellman and coworkers.[4][5][6] The connection to the Hamilton–Jacobi equation from classical physics was first drawn by Rudolf Kálmán.[7] In discrete-time problems, the analogous difference equation is usually referred to as the Bellman equation.

While classical variational problems, such as the brachistochrone problem, can be solved using the Hamilton–Jacobi–Bellman equation,[8] the method can be applied to a broader spectrum of problems. Further it can be generalized to stochastic systems, in which case the HJB equation is a second-order elliptic partial differential equation.[9] A major drawback, however, is that the HJB equation admits classical solutions only for a sufficiently smooth value function, which is not guaranteed in most situations. Instead, the notion of a viscosity solution is required, in which conventional derivatives are replaced by (set-valued) subderivatives.[10]

Optimal Control Problems

[edit]

Consider the following problem in deterministic optimal control over the time period :

where is the scalar cost rate function and is a function that gives the bequest value at the final state, is the system state vector, is assumed given, and for is the control vector that we are trying to find. Thus, is the value function.

The system must also be subject to

where gives the vector determining physical evolution of the state vector over time.

The Partial Differential Equation

[edit]

For this simple system, the Hamilton–Jacobi–Bellman partial differential equation is

subject to the terminal condition

As before, the unknown scalar function in the above partial differential equation is the Bellman value function, which represents the cost incurred from starting in state at time and controlling the system optimally from then until time .

Deriving the Equation

[edit]

Intuitively, the HJB equation can be derived as follows. If is the optimal cost-to-go function (also called the 'value function'), then by Richard Bellman's principle of optimality, going from time t to t + dt, we have

Note that the Taylor expansion of the first term on the right-hand side is

where denotes the terms in the Taylor expansion of higher order than one in little-o notation. Then if we subtract from both sides, divide by dt, and take the limit as dt approaches zero, we obtain the HJB equation defined above.

Solving the Equation

[edit]

The HJB equation is usually solved backwards in time, starting from and ending at .[11]

When solved over the whole of state space and is continuously differentiable, the HJB equation is a necessary and sufficient condition for an optimum when the terminal state is unconstrained.[12] If we can solve for then we can find from it a control that achieves the minimum cost.

In general case, the HJB equation does not have a classical (smooth) solution. Several notions of generalized solutions have been developed to cover such situations, including viscosity solution (Pierre-Louis Lions and Michael Crandall),[13] minimax solution (Andrei Izmailovich Subbotin [ru]), and others.

Approximate dynamic programming has been introduced by D. P. Bertsekas and J. N. Tsitsiklis with the use of artificial neural networks (multilayer perceptrons) for approximating the Bellman function in general.[14] This is an effective mitigation strategy for reducing the impact of dimensionality by replacing the memorization of the complete function mapping for the whole space domain with the memorization of the sole neural network parameters. In particular, for continuous-time systems, an approximate dynamic programming approach that combines both policy iterations with neural networks was introduced.[15] In discrete-time, an approach to solve the HJB equation combining value iterations and neural networks was introduced.[16]

Alternatively, it has been shown that sum-of-squares optimization can yield an approximate polynomial solution to the Hamilton–Jacobi–Bellman equation arbitrarily well with respect to the norm.[17]

Extension to Stochastic Problems

[edit]

The idea of solving a control problem by applying Bellman's principle of optimality and then working out backwards in time an optimizing strategy can be generalized to stochastic control problems. Consider similar as above

now with the stochastic process to optimize and the steering. By first using Bellman and then expanding with Itô's rule, one finds the stochastic HJB equation

where represents the stochastic differentiation operator, and subject to the terminal condition

Note that the randomness has disappeared. In this case a solution of the latter does not necessarily solve the primal problem, it is a candidate only and a further verifying argument is required. This technique is widely used in Financial Mathematics to determine optimal investment strategies in the market (see for example Merton's portfolio problem).

Application to LQG-Control

[edit]

As an example, we can look at a system with linear stochastic dynamics and quadratic cost. If the system dynamics is given by

and the cost accumulates at rate , the HJB equation is given by

with optimal action given by

Assuming a quadratic form for the value function, we obtain the usual Riccati equation for the Hessian of the value function as is usual for Linear-quadratic-Gaussian control.

See also

[edit]
  • Bellman equation, discrete-time counterpart of the Hamilton–Jacobi–Bellman equation.
  • Pontryagin's maximum principle, necessary but not sufficient condition for optimum, by maximizing a Hamiltonian, but this has the advantage over HJB of only needing to be satisfied over the single trajectory being considered.

References

[edit]
  1. ^ Kirk, Donald E. (1970). Optimal Control Theory: An Introduction. Englewood Cliffs, NJ: Prentice-Hall. pp. 86–90. ISBN 0-13-638098-0.
  2. ^ Yong, Jiongmin; Zhou, Xun Yu (1999). "Dynamic Programming and HJB Equations". Stochastic Controls : Hamiltonian Systems and HJB Equations. Springer. pp. 157–215 [p. 163]. ISBN 0-387-98723-1.
  3. ^ Naidu, Desineni S. (2003). "The Hamilton–Jacobi–Bellman Equation". Optimal Control Systems. Boca Raton: CRC Press. pp. 277–283 [p. 280]. ISBN 0-8493-0892-5.
  4. ^ Bellman, R. E. (1954). "Dynamic Programming and a new formalism in the calculus of variations". Proc. Natl. Acad. Sci. 40 (4): 231–235. Bibcode:1954PNAS...40..231B. doi:10.1073/pnas.40.4.231. PMC 527981. PMID 16589462.
  5. ^ Bellman, R. E. (1957). Dynamic Programming. Princeton, NJ: Princeton University Press.
  6. ^ Bellman, R.; Dreyfus, S. (1959). "An Application of Dynamic Programming to the Determination of Optimal Satellite Trajectories". J. Br. Interplanet. Soc. 17: 78–83.
  7. ^ Kálmán, Rudolf E. (1963). "The Theory of Optimal Control and the Calculus of Variations". In Bellman, Richard (ed.). Mathematical Optimization Techniques. Berkeley: University of California Press. pp. 309–331. OCLC 1033974.
  8. ^ Kemajou-Brown, Isabelle (2016). "Brief History of Optimal Control Theory and Some Recent Developments". In Budzban, Gregory; Hughes, Harry Randolph; Schurz, Henri (eds.). Probability on Algebraic and Geometric Structures. Contemporary Mathematics. Vol. 668. pp. 119–130. doi:10.1090/conm/668/13400. ISBN 9781470419455.
  9. ^ Chang, Fwu-Ranq (2004). Stochastic Optimization in Continuous Time. Cambridge, UK: Cambridge University Press. pp. 113–168. ISBN 0-521-83406-6.
  10. ^ Bardi, Martino; Capuzzo-Dolcetta, Italo (1997). Optimal Control and Viscosity Solutions of Hamilton–Jacobi–Bellman Equations. Boston: Birkhäuser. ISBN 0-8176-3640-4.
  11. ^ Lewis, Frank L.; Vrabie, Draguna; Syrmos, Vassilis L. (2012). Optimal Control (3rd ed.). Wiley. p. 278. ISBN 978-0-470-63349-6.
  12. ^ Bertsekas, Dimitri P. (2005). Dynamic Programming and Optimal Control. Athena Scientific.
  13. ^ Bardi, Martino; Capuzzo-Dolcetta, Italo (1997). Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Boston: Birkhäuser. ISBN 0-8176-3640-4.
  14. ^ Bertsekas, Dimitri P.; Tsitsiklis, John N. (1996). Neuro-dynamic Programming. Athena Scientific. ISBN 978-1-886529-10-6.
  15. ^ Abu-Khalaf, Murad; Lewis, Frank L. (2005). "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach". Automatica. 41 (5): 779–791. doi:10.1016/j.automatica.2004.11.034. S2CID 14757582.
  16. ^ Al-Tamimi, Asma; Lewis, Frank L.; Abu-Khalaf, Murad (2008). "Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof". IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. 38 (4): 943–949. doi:10.1109/TSMCB.2008.926614. PMID 18632382. S2CID 14202785.
  17. ^ Jones, Morgan; Peet, Matthew (2020). "Polynomial Approximation of Value Functions and Nonlinear Controller Design with Performance Bounds". arXiv:2010.06828 [math.OC].

Further reading

[edit]