OVERCOMING VALUE OVERESTIMATION FOR DISTRIBUTIONAL REINFORCEMENT LEARNING-BASED PATH PLANNING WITH CONSERVATIVE CONSTRAINTS

Yuwan Gu, Yongtao Chu, Fang Meng, Yan Chen, Jidong Lv, and Shoukun Xu

Keywords

Distributional reinforcement learning, conservative constraints, quantile network, path planning

Abstract

Navigating mobile robots through unknown environments without prior knowledge poses a significant challenge in path planning. To address the lack of value distributional information in reinforcement learning’s (RL’s) actor–critic framework, we propose an algorithm leveraging distributional reinforcement learning (Distributional RL). This approach, operating under conservative constraints to mitigate value overestimation, employs a neural network based on quantile regression for learning the full reward distribution. To counter the risk of suboptimal policy due to value function overestimation, we introduce a conservative framework using KL divergence and adjust distributions for effective path planning. Within a simulated environment, a random reconstruction module enhances policy generalisation. Comparative experiments with SAC, TD3, DARC, and DSAC demonstrate our method’s comparable performance to TD3 in sparse obstacle environments and superior performance in dense obstacle settings, outperforming others.

Important Links:

Go Back