段龙锦,王贵勇,王伟超,等.基于深度强化学习的增程式电动轻卡能量管理策略[J].内燃机工程,2023,44(6):90-99.
基于深度强化学习的增程式电动轻卡能量管理策略
Energy Management Strategy of An Extended Range Electric Light Truck Based on Deep Reinforcement Learning
DOI:10.13949/j.cnki.nrjgc.2023.06.011
关键词:深度Q网络  深度确定性策略梯度  双延迟深度确定性策略梯度算法  增程式电动轻卡
Key Words:deep Q-network(DQN)  deep deterministic policy gradient(DDPG)  twin delayed deep deterministic policy gradient(TD3) algorithm  extended range electric light truck
基金项目:国家自然科学基金项目(52066008);云南省科技厅揭榜制项目(202104BN050007);云南省科技计划项目(202102AC080004)
作者单位E-mail
段龙锦* 昆明理工大学 云南省内燃机重点实验室昆明 650500 1456249466@qq.com 
王贵勇* 昆明理工大学 云南省内燃机重点实验室昆明 650500 wangguiyong@kust.edu.cn 
王伟超 昆明理工大学 云南省内燃机重点实验室昆明 650500 3262386925@qq.com 
何述超 昆明云内动力股份有限公司昆明 650500 3564097974@qq.com 
摘要点击次数: 489
全文下载次数: 284
摘要:为了解决增程式电动轻卡辅助动力单元(auxiliary power units, APU)和动力电池之间能量的合理分配问题,在Simulink中建立面向控制的仿真模型,并提出一种基于双延迟深度确定性策略梯度(twin delayed deep deterministic policy gradient, TD3)算法的实时能量管理策略,以发动机燃油消耗量、电池荷电状态(state of charge, SOC)变化等为优化目标,在世界轻型车辆测试程序(world light vehicle test procedure, WLTP)中对深度强化学习智能体进行训练。仿真结果表明,利用不同工况验证了基于TD3算法的能量管理策略(energy management strategy, EMS)具有较好的稳定性和适应性;TD3算法实现对发动机转速和转矩连续控制,使得输出功率更加平滑。将基于TD3算法的EMS与基于传统深度Q网络(deep Q-network,DQN)算法和深度确定性策略梯度(deep deterministic policy gradient, DDPG)算法进行对比分析,结果表明:基于TD3算法的EMS燃油经济性分别相比基于DQN算法和DDPG算法提高了12.35%和0.67%,达到基于动态规划(dynamic programming, DP)算法的94.85%,收敛速度相比基于DQN算法和DDPG算法分别提高了40.00%和47.60%。
Abstract:In order to solve the problem of reasonable energy allocation between auxiliary power units(APUs) and power batteries in incremental electric light trucks, a control oriented simulation model was established in Simulink, and a real-time energy management strategy based on the twin delayed deep deterministic policy gradient (TD3) algorithm was proposed to reduce engine fuel consumption. The state of charge(SOC) change of the battery was the optimization objective, and deep reinforcement learning agents were trained in the world light vehicle test procedure(WLTP). The simulation results show that the energy management strategy(EMS) based on TD3 algorithm has good stability and adaptability, which has been validated under different operating conditions. The TD3 algorithm achieves continuous control of engine speed and torque, making the output power smoother. The EMS based on TD3 algorithm was compared with the EMS based on the traditional deep Q network(DQN) algorithm and the deep deterministic policy gradient(DDPG) algorithm. The fuel economy of the EMS based on the TD3 algorithm was improved by 12.35% and 0.67% respectively compared to EMS based on DQN algorithm and DDPG algorithm, reaching 94.85% of the EMS based on the dynamic programming(DP) algorithm. And the convergence speed was improved by 40.00% and 47.60% respectively compared to EMS based on DQN algorithm and DDPG algorithm.
查看全文  HTML   查看/发表评论