Douglas B. Tweed
Optimal control, learning algorithms, nonlinear systems
A promising approach to optimal control is to start with a non- optimal controller u(1) and improve it. One very efficient example is the method of generalized Hamilton–Jacobi–Bellman (GHJB) equations, which learns an approximation to the gradient ∇J(1) of the cost-to-go function of u(1), uses that gradient to define a better controller u(2), and then repeats, creating a sequence u(n) that converges to the optimal controller. Here we point out that GHJB works indirectly in the sense that it does not learn the best approximation to ∇J but instead learns the time derivative dJ/dt, and infers ∇J from that. We show we can get lower-cost controllers with fewer adjustable parameters by learning ∇J directly. We then compare this direct method with GHJB on test problems from the literature.
Important Links:
Go Back