登录    注册    忘记密码

详细信息

A dual-arm cooperative control method based on improved proximal policy optimization  ( SCI-EXPANDED收录)  

文献类型:期刊文献

英文题名:A dual-arm cooperative control method based on improved proximal policy optimization

作者:Su, Man Yuan, Qingni Qu, Pengju Wang, Chao Zhou, Yinjiang

第一作者:Su, Man

通信作者:Yuan, QN[1]

机构:[1]Guizhou Univ, Guiyang 550025, Guizhou, Peoples R China;[2]Guizhou Inst Technol, Guiyang 550025, Guizhou, Peoples R China

第一机构:Guizhou Univ, Guiyang 550025, Guizhou, Peoples R China

通信机构:corresponding author), Guizhou Univ, Guiyang 550025, Guizhou, Peoples R China.

年份:2025

卷号:38

期号:2

外文期刊名:JOURNAL OF KING SAUD UNIVERSITY COMPUTER AND INFORMATION SCIENCES

收录:;WOS:【SCI-EXPANDED(收录号:WOS:001674346400002)】;

基金:This research was supported by The National Natural Science Foundation of China (Grant Nos. 52065010 and 52165063), The Department of Science and Technology of Guizhou Province (Grant Nos. [2023]G094, [2023]G125, and [2024]K154), and The Guizhou Provincial Key Laboratory Construction Program (QKH Platform ZSYS[2025]012). Additional support was provided by The Graduate Scientific Research and Innovation Program of Guizhou University (Project No. YKJP202306), as well as the Science and Technology Project titled "Research and Application of an Electronic Ledger System for Case-Related Cigarettes Based on Digital Management" (Qianyanzhuke [2024] No. 4, 2024-07).

语种:英文

外文关键词:Proximal policy optimization; Dual-arm collaborative control; Hierarchical reinforcement learning

摘要:To address the challenges of high-dimensional control in dual-arm collaborative tasks, the complexity of multi-stage decision-making, and the limitations of traditional Proximal Policy Optimization (PPO) algorithms due to their single constraint mechanism, which results in policy bias and insufficient convergence efficiency, this paper proposes a dual-arm collaborative control method based on an improved Proximal Policy Optimization algorithm. Based on deep reinforcement learning, the state space and action space of the dual-arm system are first defined, and a perception-decision-update closed-loop interaction mechanism is constructed. Subsequently, a Hierarchical Constrained Hybrid Proximal Policy Optimization algorithm (HCH-PPO) is proposed, which designs a dual-timescale hierarchical policy, establishes dynamic hybrid constraints, and incorporates an adaptive parameter adjustment mechanism. While maintaining the efficiency of Proximal Policy Optimization (PPO), the algorithm introduces Trust Region Policy Optimization (TRPO) to enhance the stability of the optimization process and the policy exploration capability. This hierarchical optimization framework effectively enables efficient state-to-action mapping learning. Finally, experimental results demonstrate that, compared to traditional PPO, the proposed method achieves a 56.82% improvement in convergence speed and a 12% increase in task success rate in dual-arm collaborative grasping and placing tasks, indicating significant performance enhancement.

参考文献:

正在载入数据...

版权所有©贵州理工学院 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心