张松,王坤羽,杨蓉,黄伟.混合动力公交车深度强化学习能量管理策略研究[J].内燃机工程,2021,42(6):10-16.
混合动力公交车深度强化学习能量管理策略研究
Research on Energy Management Strategy for Hybrid Electric Bus Based on Deep Reinforcement Learning
DOI:10.13949/j.cnki.nrjgc.2021.06.002
关键词:双行星排  混合动力  能量管理  深度强化学习
Key Words:dual-planetary  hybrid powertrain  energy management  deep reinforcement learning
基金项目:国家重点研发计划项目(2017YFE0102800);广西科技基地和人才专项项目(AD19110019);广西创新驱动发展专项项目(AA182420453)
作者单位
张松,王坤羽,杨蓉,黄伟 1.广西玉柴机器股份有限公司玉林 5370052.广西大学 机械工程学院南宁 530004 
摘要点击次数: 1493
全文下载次数: 804
摘要:以某款双行星排混合动力公交车为样车,针对控制变量柴油机转速的离散控制和连续控制分别提出基于双深度Q-网络(double deep Q-learning, DDQN)和基于双延迟深度确定性策略梯度(twin delayed deep deterministic policy gradients, TD3)的能量管理策略,并使用优先级经验回放对策略进行优化。仿真研究了样车在C-WTVC工况下的能量管理特性。通过与动态规划策略(dynamic programming, DP)进行对比发现:DDQN和TD3策略收敛速度快,具有较强的自适应能力;与DP策略相似,DDQN和TD3策略在控制逻辑上均表现为低速和较低转矩时纯电驱动,高速和较高转矩时混合驱动;3种策略下柴油机均主要工作于中低转速区间,且TD3策略可以对柴油机转速进行连续控制;DDQN和TD3策略的百公里油耗分别为19.51L和19.48L,燃油经济性均达到DP策略的93%,研究证明了DDQN和TD3策略的有效性。
Abstract:Taking a dual planetary hybrid bus as a sample vehicle, the energy management strategies based on double deep Q-learning (DDQN) and twin delayed deep deterministic policy gradient (TD3) were proposed respectively for the discrete control and continuous control of the control variable diesel engine speed, and the prioritized experience replay was used to optimize the strategy. The energy management characteristics of the sample vehicle under C-WTVC condition were studied by simulation. Results show that compared with dynamic programming (DP), DDQN and TD3 strategies have fast convergence speed and strong adaptive ability. Similar to DP strategy, DDQN and TD3 strategies use pure electric drive at low speed and low torque, and choose hybrid drive at high speed and high torque. With the three strategies, the diesel engine mainly works in the low and middle speed range, and TD3 strategy can continuously control diesel engine speed. The fuel consumption of DDQN and TD3 strategies is 0.1951L/km and 0.1948L/km, respectively. And the fuel economy of the two strategies reaches 93% of that of DP strategy, which proves the effectiveness of DDQN and TD3 strategies.
查看全文  HTML   查看/发表评论