Create New Account
Login
Search or Buy Articles
Browse Journals
Browse Proceedings
Submit your Paper
Submission Information
Journal Review
Recommend to Your Library
Call for Papers
OVERCOMING VALUE OVERESTIMATION FOR DISTRIBUTIONAL REINFORCEMENT LEARNING-BASED PATH PLANNING WITH CONSERVATIVE CONSTRAINTS, 124-132.
Yuwan Gu, Yongtao Chu, Fang Meng, Yan Chen, Jidong Lv, and Shoukun Xu
References
[1] Md. A. K. Niloy, A. Shama, R.K. Chakrabortty, M.J. Ryan,F.R. Badal, Z. Tasneem, Md. H. Ahamed, S.I. Moyeen, S.K.Das, Md. F. Ali, Md. R. Islam, and D.K. Saha, Critical designand control issues of indoor autonomous mobile robots: Areview, IEEE Access, 9, 2021, 35338–35370.
[2] L.A. Nguyen, T.D. Ngo, T.D. Pham, and X.T. Truong, Anefficient navigation system for autonomous mobile robots indynamic social environments, International Journal of Roboticsand Automation, 37(1), 2022, 97–106.
[3] R.H. Abiyev, N. Akkaya, E. Aytac, and D. Ibrahim, Behaviourtree based control for efficient navigation of holonomic robots,International Journal of Robotics and Automation, 29(6), 2014,2014.
[4] X. Yang, W. Yang, and H. Zhang, A new method for robotpath planning based artificial potential field, Proc. 2016 IEEE11th Conf. on Industrial Electronics and Applications (ICIEA),Hefei, 2016, 1294–1299.
[5] S. Sedighi, D.V. Nguyen, and K.D. Kuhnert, Guided hybridA-star path planning algorithm for valet parking applications,Proc. 2019 5th International Conf. on Control, Automationand Robotics (ICCAR), Beijing, 2019, 570–575.
[6] K. Ming, Solving path planning problem based on ant colonyalgorithm, Proc. 2017 29th Chinese Control And DecisionConf. (CCDC), Chongqing, 2017, 5391–5395.130
[7] V. Mnih, K. Kavukcuoglu, D. Silver, and A. Rusu, Human-level control through deep reinforcement learning, Nature,518(7540), 2015, 529–533.
[8] J. Degrave, F. Felici, J. Buchli, M. Neunert, and B.Tracey, Magnetic control of tokamak plasmas through deepreinforcement learning, Nature, 602(7897), 2022, 414–419.
[9] M. Tang and V.W.S. Wong, Deep reinforcement learningfor task offloading in mobile edge computing systems, IEEETransactions on Mobile Computing, 21(6), 2022, 1985–1997.
[10] D. Silver, T. Hubert, J. Schrittwieser, and I. Antonoglou, Ageneral reinforcement learning algorithm that masters chess,shogi, and Go through self-play, Science, 362(6419), 2018,1140–1144.
[11] T. Morimura, M. Sugiyama, H. Kashima, H. Hachiya, and T.Tanaka, Nonparametric return distribution approximation forreinforcement learning, Proc. of the 27th International Conf. onInternational Conference on Machine Learning, Haifa, 2010,799–806.
[12] H. van Hasselt, A. Guez, and D. Silver, Deep reinforcementlearning with double Q-learning, Proceedings of the AAAIConference on Artificial Intelligence, 30(1), 2016.
[13] Y. Liu, M. Cong, H. Dong, and D. Liu, Reinforcementlearning and EGA-based trajectory planning for dual robots,International Journal of Robotics and Automation, 33, 2018.
[14] X. Tang, Y. Yang, T. Liu, X. Lin, K. Yang, and S. Li, Pathplanning and tracking control for parking via soft actor-criticunder non-ideal scenarios, I EEE/CAA Journal of AutomaticaSinica, 11(1), 2023, 181–195.
[15] D. Hong, S. Lee, Y. H. Cho, D. Baek, J. Kim, and N.Chang, Energy-efficient online path planning of multiple dronesusing reinforcement learning, IEEE Transactions on VehicularTechnology, 70(10), 2021, 9725–9740.
[16] S. Thrun and A. Schwartz, Issues in using functionapproximation for reinforcement learning, Proc. of the FourthConnectionist Models Summer School, Hillsdale, NJ, 1993.
[17] S. Fujimoto, H. van Hoof, and D. Meger, Addressing functionapproximation error in actor-critic methods, Proc. of the 35thInternational Conf. on Machine Learning, Stockholm, 2018,1587–1596.
[18] J. Lyu, X. Ma, J. Yan, and X. Li, Efficient continuous controlwith double actors and regularized critics, Proceedings of theAAAI Conference on Artificial Intelligence, 36(7), 2022.
[19] M.G. Bellemare, W. Dabney, and R. Munos, A distributionalperspective on reinforcement learning, Proc. of the 34thInternational Conf. on Machine Learning, Sydney NSW, 2017,449–458.
[20] T. Nguyen-Tang, S. Gupta, and S. Venkatesh, Distributionalreinforcement learning via moment matching, Proceedingsof the AAAI Conference on Artificial Intelligence, 2021,9144–9152.
[21] W. Dabney, M. Rowl, M. Bellemare, and R. Munos,Distributional reinforcement learning with quantile regression,Proceedings of the AAAI Conference on Artificial Intelligence,32, 2017.
[22] A.S. Lowet, Q. Zheng, S. Matias, J. Drugowitsch, and N.Uchida, Distributional reinforcement learning in the brain,Trends in Neurosciences, 43(12), 2020, 980–997.
[23] C. Banerjee, Z. Chen, and N. Noman, Improved soft actor-critic: Mixing prioritized off-policy samples with on-policyexperiences, IEEE Transactions on Neural Networks andLearning Systems, 35(3), 2022, 3121–3129.
[24] A. Kumar, A. Zhou, G. Tucker, and S. Levine, ConservativeQ-learning for offline reinforcement learning, Proc. Advancesin Neural Information Processing Systems, Virtual, 2020,1179–1191.
[25] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learningwith a stochastic actor, Proc. of the 35th International Conf.on Machine Learning, Stockholm, 2018, 1861–1870.
[26] M. Arjovsky, S. Chintala, and L. Bottou, Wasserstein generativeadversarial networks, Proc. International Conf. on MachineLearning, Sydney, 2017, 214–223.
[27] T. Schaul, D. Horgan, K. Gregor, and D. Silver, Universalvalue function approximators, Proc. of the 32nd InternationalConf. on Machine Learning, Lille, 2015, 1312–1320.
[28] N. Tishby and N. Zaslavsky, Deep learning and the informationbottleneck principle, Proc. 2015 IEEE Information TheoryWorkshop (ITW), Jerusalem, 2015, 1–5.
[29] Y. Mo, L. Peng, J. Xu, X. Shi, and X. Zhu, Simpleunsupervised graph representation learning, Proceedings of theAAAI Conference on Artificial Intelligence, 2022, 7797–7805.
[30] D.P. Kingma and M. Welling, Auto-encoding variational Bayes,2014, arXiv:1312.6114.
[31] D. Kingma and J. Ba, Adam: A method for stochasticoptimization, Computer Science, 2014.
[32] J. Duan, Y. Guan, S. E. Li, Y. Ren, Q. Sun, and B.Cheng, Distributional soft actor-critic: Off-policy reinforcementlearning for addressing value estimation errors, IEEETransactions on Neural Networks and Learning Systems,33(11), 2022, 6584–6598.
Important Links:
Abstract
DOI:
10.2316/J.2025.206-1114
From Journal
(206) International Journal of Robotics and Automation - 2025
Go Back