0. Title
- Comparison of Deep Reinforcement Learning and Model Predictive Control for Adaptive Cruise Control
1. Authors
- Yuan Lin
2. Abstract
- 차량 추종 시나리오에서 심층 강화 학습(DRL)과 모델 예측 제어(MPC)를 비교, DRL은 DDPG 쓰고 MPC는 IPO 씀
Comparison of Deep Reinforcement Learning (DRL) and Model Predictive Control (MPC) in vehicle follow-up scenarios: DRL uses DDPG and MPC uses IPO
- 결과는 모델링 오류가 없고 테스트 입력이 훈련 데이터 범위 내에 있을 때 DRL이 충분히 긴 예측 Horizon을 가진 MPC와 동등한 성능을 보임
Results show comparable performance between MPC with no modeling errors and sufficiently long predictive Horizon and DRL when test inputs are within the training data range
- 특히 DRL 에피소드 Cost은 IPO를 통해 전체 에피소드를 최적화함으로써 제공되는 벤치마크 솔루션보다 5.8%밖에 높지 않다.
In particular, the DRL episode cost is only 5.8% higher than the benchmark solution provided by optimizing the entire episode via IPO.
- DRL 제어 성능은 테스트 입력이 훈련 데이터 범위를 벗어나면 저하되어 부적절한 일반화를 나타냄
DRL control performance deteriorates when test input is outside the training data range, indicating inadequate generalization
- MPC에 제어 지연, 교란, 모델링 오류가 있는 경우 DRL이 성능이 더 좋음
DRL performs better if MPC has control delay, disturbance, modeling error
- MPC의 모델링 오류가 작을 때는 DRL은 MPC와 유사한 성능을 발휘
DRL perform similar to MPC when modeling errors are small
3. Motivation
- 모델 에러/간섭이 없는 MPC와 테스트 상황이 이미 학습되었던 DRL 중 어떤게 좋을까?
Which would be better, MPC without model error/interference or DRL where the test situation has already been learned?
4. Contributions(Findings)
- Know the difference between MPC and DRL
5. Methodology
-
6. Measurements
DRL | MPC | |
Pros | 1. Low Computation time 2. Superior handling of modeling errors |
1. handling of hard constraints 2. Almost optimal with long prediction horizons |
Cons | 1. DRL suffers from the known generalization issue of machine learning. 2. DRL control solutions are black-box neural nets that lack theoretical assurance |
1. High computation time, especially for non-linear systems 2. Hard to handling of modeling errors if not robust MPC |
DRL Pros
1. Low Computation time
- In this paper, comparing the Episode simulation time of MPC (prediction horizon 2.8s) and DRL, we found that DRL runs 94 times faster. We also found that if MPC's prediction horizon is less than 2.8 seconds, it is worse than DRL's performance.
2. Superior handling of modeling errors
- the reinforcement learning environment state transition, i.e., transition from the current to the next state, is based on expectation of probabilities although the state-action mapping is deterministic. The probabilistic state transition allows for environment stochasticity that can be represented as modeling errors. The inherent consideration of environment tochasticity in the DDPG algorithm would thus contribute to the better modeling-error tolerance.
DRL Cons
1. DRL suffers from the known generalization issue of machine learning.
- DRL control performance deteriorates when test input is outside the training data range, indicating inadequate generalization
2. DRL control solutions are black-box neural nets that lack theoretical assurance
- Using deep learning networks to find estimates of Q-value
MPC Pros
1. handling of hard constraints
- MPC's Property
2. Almost optimal with long prediction horizons
- I can't explain it clearly
MPC Cons
1. High computation time, especially for non-linear systems
- In this paper, comparing the Episode simulation time of MPC (prediction horizon 2.8s) and DRL, we found that DRL runs 94 times faster. We also found that if MPC's prediction horizon is less than 2.8 seconds, it is worse than DRL's performance.
2. Hard to handling of modeling errors if not robust MPC
- Model errors accumulate as Prediction Horizon becomes larger.
7. Limitations(If it's not written, think about it)
- Basic models are compared without comparing additionally advanced MPC or RL
8. Potential Gap
- Good