OPTIMAL CONTROL BY DIRECT APPROXIMATION OF THE GRADIENT OF THE COST-TO-GO

Douglas B. Tweed

References

  1. [1] R.E. Bellman, Adaptive control processes (Princeton, NJ:Princeton University Press, 1961).
  2. [2] F. Cucker and S. Smale, On the mathematical foundations oflearning, Bulletin of the American Mathematical Society, 39,2001, 1–49.
  3. [3] R.S. Sutton and A.G. Barto, Reinforcement learning (Cam-bridge, MA: MIT Press, 1998).
  4. [4] R.S. Sutton, D. McAllester, S. Singh, and Y. Mansour, Policygradient methods for reinforcement learning with functionapproximation, in Advances in neural information processingsystems (Cambridge, MA: MIT Press, 2000).
  5. [5] G. Saridis and C.S. Lee, An approximation theory of optimalcontrol for trainable manipulators, IEEE Transactions onSystems, Man, Cybernetics, 9(3), 1979, 152–159.
  6. [6] R. Beard, G. Saridis, and J. Wen, Galerkin approximations ofthe generalized Hamilton–Jacobi–Bellman equation, Automat-ica, 33(12), 1997, 2159–2177.
  7. [7] M. Abu-Khalaf and F.L. Lewis, Nearly optimal state feedbackcontrol of constrained nonlinear systems using a neural networksHJB approach, Annual Reviews in Control, 28, 2004, 239–251.
  8. [8] M. Abu-Khalaf and F.L. Lewis, Nearly optimal control lawsfor nonlinear systems with saturating actuators using a neuralnetwork HJB approach, Automatica, 41, 2005, 779–791.
  9. [9] F.L. Lewis and M. Abu-Khalaf, A Hamilton–Jacobi setup forconstrained neural network control, IEEE Int. Symposium onIntelligent Control, Houston, TX, 2003.
  10. [10] S.E. Lyshevski, Optimal control of nonlinear continuous-time systems: Design of bounded controllers via generalizednonquadratic functionals, Proc. of American Control Conf.,Philadelphia, Pennsylvania, USA, 1998.
  11. [11] S. Lyshevski, Control systems with engineering applications(Boston, MA: Birkhauser, 2001).
  12. [12] E. Todorov, Optimality principles in sensorimotor control,Nature Neuroscience, 7(9), 2004, 907–915.
  13. [13] A.E. Bryson and Y.C. Ho, Applied optimal control (Washington,DC: Hemisphere, 1975).
  14. [14] M. French, C. Szepesv´ari, and E. Rogers, Performance ofnonlinear approximate adaptive controllers (Chichester: Wiley,2003).
  15. [15] D. Jackson, The general theory of approximation by poly-nomials and trigonometric sums, Bulletin of the AmericanMathematical Society, 27, 1920–21, 415–431.
  16. [16] K.S. Narendra and A.M. Annaswamy, Stable adaptive systems(Engelwood Cliffs, NJ: Prentice-Hall, 1989).
  17. [17] K.J. Astrom and B. Wittenmark, Adaptive control (Reading,MA: Addison-Wesley, 1995).
  18. [18] W. Liu, J. Pr´ıncipe, and S. Haykin, Kernel adaptive filtering(Hoboken, NJ, USA: Wiley, 2010).
  19. [19] D.E. Kirk, Optimal control theory (Englewood Cliffs, NJ:Prentice-Hall, 1970).
  20. [20] J.J.E. Slotine and W. Li, Applied nonlinear control (UpperSaddle River, NJ: Prentice-Hall, 1991).
  21. [21] W. Powell, Approximate dynamic programming (Hoboken, NJ:Wiley, 2007).
  22. [22] F.L. Lewis, S. Jagannathan, and A. Yesildirek, Neural networkcontrol of robot manipulators and nonlinear systems (London:Taylor and Francis, 1999).
  23. [23] M.I. Alomoush, Fractional calculus-based optimal controllers ofautomatic voltage regulator in power system, Control and Intel-ligent Systems, 38, 2010, DOI: 19.2316/Journal.201.2010.1.201-2179.
  24. [24] A. Fekih, Improved LQR-based control approach for highperformance induction motor drives, Control and IntelligentSystems, 2009, DOI: 10.2316/Journal.201.2009.1.201-2014.
  25. [25] N.S. Bhuvaneswari, G. Uma, and T.R. Rangaswamy, Neuralnetwork with dynamic programming for time-optimal control32of conical water level, Control and Intelligent Systems, 2009,DOI: 10.2316/Journal.201.2009.4.201-1939.
  26. [26] J. Peters, R. Tedrake, N. Roy, and J.Morimoto, Robot learning,in C. Sammut and G.I. Webb (eds.), in Encyclopedia of machinelearning (Boston, MA: Springer, 2010), 865–869.
  27. [27] D. Tweed, Visual-motor optimization in binocular control,Vision Research, 37(14), 1997, 1939–1951.
  28. [28] D. Tweed, T. Haslwanter, and M. Fetter, Optimizing gazecontrol in three dimensions, Science, 281, 1998, 1363–1366.
  29. [29] C.M. Harris and D.M. Wolpert, Signal-dependent noise deter-mines motor planning, Nature, 394, 1998, 780–784.
  30. [30] E. Todorov and M.I. Jordan, Optimal feedback control as atheory of motor coordination, Nature Neuroscience, 5, 2002,1226–1235.

Important Links:

Go Back