Skip to main content

Recurrent Neural Network

RNN(Recurrent Neural Network)#

input=Dim(batch_size,timesteps,input_size)output=Dim(batch_size,timesteps,hidden_size)input = Dim(batch\_size, timesteps, input\_size) \\ output = Dim(batch\_size, timesteps, hidden\_size)
ht=tanh⁑(Whxxt+Whhhtβˆ’1+bh)h_t = \tanh(W_{hx} x_t + W_{hh} h_{t-1} + b_h)

Vanishing/exploding gradient#

κΈ΄ μ‹œν€€μŠ€λ₯Ό μ²˜λ¦¬ν•˜λŠ” RNN은 κΉŠμ€ λ„€νŠΈμ›Œν¬κ°€ λ˜λ©΄μ„œ Vanishing/exploding gradient λ¬Έμ œκ°€ λ°œμƒν•˜κΈ° μ‰½μŠ΅λ‹ˆλ‹€.

  • Relu와 같이 μˆ˜λ ΄ν•˜μ§€ μ•ŠλŠ” activation을 μ‚¬μš©ν•˜λ©΄ λΆˆμ•ˆμ •ν•΄μ§ˆ 수 μžˆμŠ΅λ‹ˆλ‹€.
  • Exploding gradient λ¬Έμ œκ°€ 발견되면, gradient clipping을 μ‚¬μš©ν•˜μ—¬ 값을 μ œν•œ ν•΄λ³Ό 수 μžˆμŠ΅λ‹ˆλ‹€.

LSTM(Long Short-Term Memory)#

κΈ΄ μ‹œν€€μŠ€λ₯Ό μ²˜λ¦¬ν•˜λŠ” RNN은 μˆœν™˜μ΄ λ°˜λ³΅λ˜λ©΄μ„œ μƒλŒ€μ μœΌλ‘œ μ•žμͺ½ κ°’μ˜ 영ν–₯이 쀄어듀 수 μžˆμŠ΅λ‹ˆλ‹€. 이λ₯Ό Long-Term Dependency라고 ν•©λ‹ˆλ‹€. LSTM 셀을 μ‚¬μš©ν•˜λ©΄, Long-Term Dependency λ¬Έμ œκ°€ μ™„ν™”λ˜λ©° ν›ˆλ ¨ μ‹œ λΉ λ₯΄κ²Œ μˆ˜λ ΄ν•©λ‹ˆλ‹€.

ft=Οƒ(Wxfxt+bxf+Whfhtβˆ’1+bhf)it=Οƒ(Wxixt+bxi+Whihtβˆ’1+bhi)ct~=tanh⁑(Wxc~xt+bxc~+Whc~htβˆ’1+bhc~)ot=Οƒ(Wxoxt+bxo+Whohtβˆ’1+bho)ct=ftβŠ—ctβˆ’1+itβŠ—ct~ht=otβŠ—tanh⁑(ct)\begin{aligned} f_t &= \sigma(W_{xf} x_t + b_{xf} + W_{hf} h_{t-1} + b_{hf}) \\ i_t &= \sigma(W_{xi} x_t + b_{xi} + W_{hi} h_{t-1} + b_{hi}) \\ \widetilde{c_t} &= \tanh(W_{x\tilde{c}} x_t + b_{x\tilde{c}} + W_{h\tilde{c}} h_{t-1} + b_{h\tilde{c}}) \\ o_t &= \sigma(W_{xo} x_t + b_{xo} + W_{ho} h_{t-1} + b_{ho}) \\ c_t &= f_t \otimes c_{t-1} + i_t \otimes \widetilde{c_t} \\ h_t &= o_t \otimes \tanh(c_t) \end{aligned}
  • ctc_tλŠ” cell stateμž…λ‹ˆλ‹€.
  • hth_tλŠ” outputμž…λ‹ˆλ‹€.
  • ftf_tλŠ” forget gate둜 이전 cell stateλ₯Ό μ–Όλ§ˆλ‚˜ μžŠμ–΄λ²„λ¦΄ 지 κ²°μ •ν•©λ‹ˆλ‹€.
  • iti_tλŠ” input gate둜 μž…λ ₯ 정보λ₯Ό μ–Όλ§ˆλ‚˜ cell state에 μ €μž₯ν•  것일지 κ²°μ •ν•©λ‹ˆλ‹€.
  • oto_tλŠ” output gate둜 μ—…λ°μ΄νŠΈ 된 cell stateμ—μ„œ μ–΄λ–€ 정보λ₯Ό 내보낼지 κ²°μ •ν•©λ‹ˆλ‹€.

GRU(Gated Recurrent Unit)#

LSTM variants 쀑 ν•˜λ‚˜μž…λ‹ˆλ‹€.

rt=Οƒ(Wxrxt+bxr+Whrhtβˆ’1+bhr)zt=Οƒ(Wxzxt+bxz+Whzhtβˆ’1+bhz)ht~=tanh⁑(Wxh~xt+bxh~+rtβŠ—(Whh~htβˆ’1+bhh~))ht=ztβŠ—htβˆ’1+(1βˆ’zt)βŠ—ht~\begin{aligned} r_t &= \sigma(W_{xr} x_t + b_{xr} + W_{hr} h_{t-1} + b_{hr}) \\ z_t &= \sigma(W_{xz} x_t + b_{xz} + W_{hz} h_{t-1} + b_{hz}) \\ \widetilde{h_t} &= \tanh(W_{x\tilde{h}} x_t + b_{x\tilde{h}} + r_t \otimes (W_{h\tilde{h}} h_{t-1} + b_{h\tilde{h}})) \\ h_t &= z_t \otimes h_{t-1} + (1 - z_t) \otimes \widetilde{h_t}\\ \end{aligned}
  • hth_tλŠ” outpuμž…λ‹ˆλ‹€.
  • rtr_tλŠ” reset gate둜 이전 stateλ₯Ό μ–Όλ§ˆλ‚˜ output에 ν¬ν•¨μ‹œν‚¬μ§€ κ²°μ •ν•©λ‹ˆλ‹€.
  • ztz_tλŠ” update gate둜 값이 1에 κ°€κΉŒμšΈ 수둝 이전 stateκ°€ μ €μž₯되고, 0에 κ°€κΉŒμšΈ 수둝 μƒˆλ‘œμš΄ stateκ°€ μ €μž₯λ©λ‹ˆλ‹€.
Last updated on