[Optimal Control] Paper Review: Constrained Policy Optimization

0. Title

- Constrained Policy Optimization

1. Authors

- Joshua Achiam

2. Abstract

- systems that physically interact with or around humans should satisfy safety constraints. Recent advances in policy search algorithms have enabled new capabilities in highdimensional control, but do not consider the constrained setting.

- We propose Constrained Policy Optimization (CPO), the first general-purpose policy search algorithm for constrained reinforcement learning with guarantees for near-constraint satisfaction at each iteration.

- Our method allows us to train neural network policies for high-dimensional control while making guarantees about policy behavior all throughout training.

3. Motivation

- The motivation of the paper is to prove new bounds on the difference in returns (or constraint returns) between two arbitrary stochastic policies in terms of an average divergence between them.

4. Contributions(Findings)

- The motivation of the paper is to prove new bounds on the difference in returns (or constraint returns) between two arbitrary stochastic policies in terms of an average divergence between them.

5. Methodology

Constrained MDP

A constrained Markov decision process (CMDP) is an MDP augmented with constraints that restrict the set of allowable policies for that MDP.

6. Measurements

- Does CPO succeed at enforcing behavioral constraints when training neural network policies with thousands of parameters?

- How does CPO compare with a baseline that uses primal-dual optimization? Does CPO behave better with respect to constraints?

- How much does it help to constrain a cost upper bound, instead of directly constraining the cost?

- What benefits are conferred by using constraints instead of fixed penalties?

7. Limitations(If it's not written, think about it)

- It is unsuitable for use-cases, where safety must be ensured for all visited states and during training.

8. Potential Gap

- 제약조건을 완전히 지키지는 못하기 때문에 잘 모르겠다.

저작자표시 (새창열림)

'논문 리뷰' 카테고리의 다른 글

자율주행 논문 리뷰: Optimal trajectories for time-critical street scenarios using discretized terminal manifolds (0)	2022.12.20
[Optimal Control] Paper Review: Cautious Model Predictive Control using Gaussian Process Regression (0)	2022.11.28
[Optimal Control] Paper Review: Safe Exploration in Continuous Action Spaces (0)	2022.11.17
[Optimal Control] Paper Review: A Review of Safe Reinforcement Learning: Methods,Theory and Applications (1)	2022.11.17
[Optimal Control] Paper Review: A Comprehensive Survey on Safe Reinforcement Learning (0)	2022.11.16

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Team. WannaBeHappy

[Optimal Control] Paper Review: Constrained Policy Optimization

0. Title

1. Authors

2. Abstract

3. Motivation

4. Contributions(Findings)

5. Methodology

6. Measurements

7. Limitations(If it's not written, think about it)

8. Potential Gap

'논문 리뷰' 카테고리의 다른 글

티스토리툴바

[Optimal Control] Paper Review: Constrained Policy Optimization

0. Title

1. Authors

2. Abstract

3. Motivation

4. Contributions(Findings)

5. Methodology

6. Measurements

7. Limitations(If it's not written, think about it)

8. Potential Gap

'논문 리뷰' 카테고리의 다른 글

'논문 리뷰' 관련글

티스토리툴바