Hamilton–Jacobi–Bellman equation: Difference between revisions

Content deleted Content added

Inline

Revision as of 11:14, 10 February 2019

The Hamilton–Jacobi–Bellman (HJB) equation is a partial differential equation which is central to optimal control theory.^[1] The solution of the HJB equation is the value function which gives the minimum cost for a given dynamical system with an associated cost function.

When solved locally, the HJB is a necessary condition, but when solved over the whole of state space, the HJB equation is a necessary and sufficient condition for an optimum. The solution is open loop, but it also permits the solution of the closed loop problem. The HJB method can be generalized to stochastic systems as well.^[2]

Classical variational problems, for example the brachistochrone problem, can be solved using this method.^[3]

The equation is a result of the theory of dynamic programming which was pioneered in the 1950s by Richard Bellman and coworkers.^[4] The corresponding discrete-time equation is usually referred to as the Bellman equation. In continuous time, the result can be seen as an extension of earlier work in classical physics on the Hamilton–Jacobi equation by William Rowan Hamilton and Carl Gustav Jacob Jacobi.

Optimal control problems

Consider the following problem in deterministic optimal control over the time period $[0,T]$ :

V(x(0),0)=\min _{u}\left\{\int _{0}^{T}C[x(t),u(t)]\,dt+D[x(T)]\right\}

where C[ ] is the scalar cost rate function and D[ ] is a function that gives the economic value or utility at the final state, x(t) is the system state vector, x(0) is assumed given, and u(t) for 0 ≤ t ≤ T is the control vector that we are trying to find.

The system must also be subject to

{\dot {x}}(t)=F[x(t),u(t)]\,

where F[ ] gives the vector determining physical evolution of the state vector over time.

The partial differential equation

For this simple system, the Hamilton–Jacobi–Bellman partial differential equation is

{\dot {V}}(x,t)+\min _{u}\left\{\nabla V(x,t)\cdot F(x,u)+C(x,u)\right\}=0

subject to the terminal condition

V(x,T)=D(x),\,

where ${\dot {V}}(x,t)$ means the partial derivative of $V$ wrt. the time variable $t$ . $a\cdot b$ means the dot product of the vectors a and b and $\nabla V(x,t)$ the gradient of $V$ wrt. the variables $x$ .

The unknown scalar $V(x,t)$ in the above partial differential equation is the Bellman value function, which represents the cost incurred from starting in state $x$ at time $t$ and controlling the system optimally from then until time $T$ .

Deriving the equation

Intuitively HJB can be derived as follows. If $V(x(t),t)$ is the optimal cost-to-go function (also called the 'value function'), then by Richard Bellman's principle of optimality, going from time t to t + dt, we have

V(x(t),t)=\min _{u}\left\{V(x(t+dt),t+dt)+\int _{t}^{t+dt}C(x(t),u(t))\,dt\right\}.

Note that the Taylor expansion of the first term is

V(x(t+dt),t+dt)=V(x(t),t)+{\dot {V}}(x(t),t)\,dt+\nabla V(x(t),t)\cdot {\dot {x}}(t)\,dt+{\mathcal {o}}(dt),

where ${\mathcal {o}}(dt)$ denotes the terms in the Taylor expansion of higher order than one in little-o notation. Then if we subtract $V(x(t),t)$ from both sides, divide by dt, and take the limit as dt approaches zero, we obtain the HJB equation defined above.

Solving the equation

The HJB equation is usually solved backwards in time, starting from $t=T$ and ending at $t=0$ .

When solved over the whole of state space and $V(x)$ is continuously differentiable, the HJB equation is a necessary and sufficient condition for an optimum when the terminal state is unconstrained.^[5] If we can solve for $V$ then we can find from it a control $u$ that achieves the minimum cost.

In general case, the HJB equation does not have a classical (smooth) solution. Several notions of generalized solutions have been developed to cover such situations, including viscosity solution (Pierre-Louis Lions and Michael Crandall), minimax solution (Andrei Izmailovich Subbotin [ru]), and others.

Extension to stochastic problems

The idea of solving a control problem by applying Bellman's principle of optimality and then working out backwards in time an optimizing strategy can be generalized to stochastic control problems. Consider similar as above

\min _{u}\mathbb {E} \left\{\int _{0}^{T}C(t,X_{t},u_{t})\,dt+D(X_{T})\right\}

now with $(X_{t})_{t\in [0,T]}\,\!$ the stochastic process to optimize and $(u_{t})_{t\in [0,T]}\,\!$ the steering. By first using Bellman and then expanding $V(X_{t},t)$ with Itô's rule, one finds the stochastic HJB equation

\min _{u}\left\{{\mathcal {A}}V(x,t)+C(t,x,u)\right\}=0,

where ${\mathcal {A}}$ represents the stochastic differentiation operator, and subject to the terminal condition

V(x,T)=D(x)\,\!.

Note that the randomness has disappeared. In this case a solution $V\,\!$ of the latter does not necessarily solve the primal problem, it is a candidate only and a further verifying argument is required. This technique is widely used in Financial Mathematics to determine optimal investment strategies in the market (see for example Merton's portfolio problem).

Application to LQG Control

As an example, we can look at a system with linear stochastic dynamics and quadratic cost. If the system dynamics is given by

dx_{t}=(ax_{t}+bu_{t})dt+\sigma dw_{t},

and the cost accumulates at rate $C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$ , the HJB equation is given by

-{\frac {\partial V(x,t)}{\partial t}}={\frac {1}{2}}q(t)x^{2}+{\frac {\partial V(x,t)}{\partial x}}ax-{\frac {b^{2}}{2r(t)}}\left({\frac {\partial V(x,t)}{\partial x}}\right)^{2}+{\frac {\sigma ^{2}}{2}}{\frac {\partial ^{2}V(x,t)}{\partial x^{2}}}.

with optimal action given by

u_{t}=-{\frac {b}{r(t)}}{\frac {\partial V(x,t)}{\partial x}}

Assuming a quadratic form for the value function, we obtain the usual Riccati equation for the Hessian of the value function as is usual for Linear-quadratic-Gaussian control.

References

^ Kirk, Donald E. (1970). Optimal Control Theory: An Introduction. Englewood Cliffs, NJ: Prentice-Hall. p. 86–90. ISBN 0-13-638098-0.
^ Chang, Fwu-Ranq (2004). Stochastic Optimization in Continuous Time. Cambridge, UK: Cambridge University Press. pp. 114–121. ISBN 0-521-83406-6.
^ Kemajou-Brown, Isabelle (2016). "Brief History of Optimal Control Theory and Some Recent Developments". In Budzban, Gregory; Hughes, Harry Randolph; Schurz, Henri (eds.). Probability on Algebraic and Geometric Structures. Contemporary Mathematics. Vol. 668. pp. 119–130. doi:10.1090/conm/668/13400.
^ Bellman, R. E. (1957). Dynamic Programming. Princeton, NJ.{{cite book}}: CS1 maint: location missing publisher (link)
^ Bertsekas, Dimitri P. (2005). Dynamic Programming and Optimal Control. Athena Scientific.

Bellman, R. E. (1954). "Dynamic Programming and a new formalism in the calculus of variations". Proc. Natl. Acad. Sci. 40 (4): 231–235. doi:10.1073/pnas.40.4.231. PMC 527981. PMID 16589462.
Bellman, R. E. (1957). Dynamic Programming. Princeton.{{cite book}}: CS1 maint: location missing publisher (link)
Bellman, R.; Dreyfus, S. (1959). "An Application of Dynamic Programming to the Determination of Optimal Satellite Trajectories". J. Brit. Interplanet. Soc. 17: 78–83.

@@ Line 49: / Line 49: @@
 :<math> V(x(t+dt), t+dt) = V(x(t), t) + \dot{V}(x(t), t) \, dt + \nabla V(x(t), t) \cdot \dot{x}(t) \, dt + \mathcal{o}(dt),</math>
-where <math>\mathcal{o}(dt)</math> denotes the terms in the Taylor expansion of higher order than one in [[Little-o notation|little-''o'' notation]]. Then if we cancel <math>V(x(t), t)</math> on both sides, divide by ''dt'', and take the limit as ''dt'' approaches zero, we obtain the HJB equation defined above.
+where <math>\mathcal{o}(dt)</math> denotes the terms in the Taylor expansion of higher order than one in [[Little-o notation|little-''o'' notation]]. Then if we subtract <math>V(x(t), t)</math> from both sides, divide by ''dt'', and take the limit as ''dt'' approaches zero, we obtain the HJB equation defined above.
 ==Solving the equation==