본문 바로가기

Autonomous Driving

DS-RNN Architecture 분석

728x90

Author: KangchanRoh

Team: Reinforcement Learning Team @ CAI Lab
Date: 2022/11/30


CrowdNav DS-RNN

 

CrowdNav DS-RNN

FoV environment Robot with a limited field of view (FoV)

sites.google.com

위 링크에서 제시한 논문의 네트워크 아키텍쳐에 대한 초핵심 요약이다.


  • Jain et al propose a general method called structural- RNN (S-RNN) that transforms any st-graph to a mixture of RNNs that learn the parameters of factor functions end-to- end. Our work is the first to combine S-RNN with model-free RL for crowd navigation.
  • St-graph use nodes to represent the problem components and edges to capture the spatio-temporal interactions.

$A.\ Problem \ formulation$

  • $MDP:\ <\mathcal{S},\mathcal{A},P,R,\gamma,\mathcal{S_0}>$
  • $\operatorname{w}^t : robot's\ state,\ \operatorname{w}^t=[(p_x,p_y),(v_x,v_y),(g_x,g_y),v_{max},\theta,\rho]$
  • $\operatorname{u}^t_i : i-th\ human's\ state,\ \operatorname{u}^t_i=(p_x^i,p_y^i)$
  • $s_t \in \mathcal{S},\ s_t=[\operatorname{w}^t,\operatorname{u}^t_i,...,\operatorname{u}^t_n]$
  • $a_t \in \mathcal{A},\ a_t=[v_x,v_y]$
  • $r(s_t,a_t)\in R,\ r(s_t,a_t)=$\begin{cases} -20, & {if\ d_{min}^t<0}\\ 2.5(d_{min}^t-0.25), & {if\ 0<d_{min}^t<0.25}\\ 10, & {if\ d_{goal}^t \le \rho_{robot}}\\2(-d_{goal}^t+d_{goal}^{t-1}), & {otherwise.} \end{cases}

$B.\ Spatio-Temporal\ Graph\ Representaion$

  • $st-graph\ components: \mathcal{G}=(\mathcal{V},\mathcal{E_S},\mathcal{E_T})$
  • $\mathcal{V}:set\ of\ nodes$
  • $\mathcal{E_S}:set\ of\ spatial\ edges$
  • $\mathcal{E_T}:set\ of\ temporal\ edges$
  • Spatial Edge : $x^t_{{u_i}w}=(p^i_x-p_x,p^i_y-p_y),\ x^t_{{u_i}w}\in\mathcal{E_S},$
  • Temporal Edge : $x^t_{ww} =(v_x,v_y),\ x^t_{ww}\in\mathcal{E_S}$
  • Node Feature : $x^t_{w} =\operatorname{w}^t=[(p_x,p_y),(v_x,v_y),(g_x,g_y),v_{max},\theta,\rho]$

st-graph representation

$C.\ Network\ Architecture$

DS-RNN Process

DS-RNN Network Architecture

  • $x^t_{{u_i}w}$ $\operatorname{R}_S$ → $h_{u_{i}w}^t$ $$ h_{u_{i}w}^t = \operatorname{RNN}(h_{u_{i}w}^{t-1},\ f_{spatial}(x_{u_iw}^t)) $$
  • $x^t_{ww}$ → $\operatorname{R}_T$ → $h_{ww}^t$ $$ h_{ww}^t = \operatorname{RNN}(h_{ww}^{t-1},\ f_{temporal}(x_{ww}^t)) $$
  • $h_{u_{i}w}^t,h_{ww}^t$ → Attention Module → $v_{att}^t$ 
    • Attention Module
      • Notation
        • $V^t = [h_{u_{i}w}^t,...,h_{u_{n}w}^t]^\top$ : Output of $\operatorname{R}_S$
        • $W$: Weight
        • $v_{att}^t$ : Attention weighted sum of spatial edges
      • Linear transformations$$ Q^t=V^tW_Q,\ \ K^t=h^t_{ww}W_K $$
      • Attention weight$$ \alpha^t=\operatorname{softmax}({n \over \sqrt{d_k}}Q^t(K^t)^\top) $$
      • Output : Attention weighted spatial hidden state$$ v_{att}^t=(V^t)^\top \alpha^t $$
  • $v_{att}^t$ + $x^t_{w}$ → $\operatorname{R}_N$ → $h_{w}^t$ $$ h_{w}^t = \operatorname{RNN}(h_{w}^{t-1},\ [\ f_{edge}(v_{att}^t,h_{ww}^t),\ f_{node}(x_{w}^t)\ ]) $$
  • $h_{w}^t$ → PPO → $V(s_t),\pi(a_t|s_t)$
반응형