[GSoC2020][Basic]What is Recurrent Neural Network

Since my mentor and I are trying to redirect the project to an Audio-Visual oriented research project. I would need to pick up the basis of RNN and dive into more sophisticated models. Therefore, I think it might be good to write down the things that I read. The first is quite basic, the Recurrent Neural Network.

Why do we need Recurrent Neural Network(RNN)?

As the name suggests, the data is used recurrently in the model. The basic idea is to enable the network to take into account the information in sequence. For instance, in a speech recognition task, if we know that the previous word is ‘Make’ then it is preferable for the model to chose a noun, e.g. Make America Great Again(no offense), instead of another verb.

However, most feed-forward networks process the data independently. A typical CNN treats each image without thinking about the next input image. We have to use 3D CNN to involve the information in sequence, which is computationally expensive. How can we do to enable the network to think in a context? Here comes the idea of RNN.

A typical CNN(VGG16) only considers one image at a time[1]

What is RNN?

I think the following figure is widely presented when someone wants to illustrate the structure of the network.

RNN structure[2]

It takes me some time to understand the basic idea. Then I realize how compact and informative the left figure is!

Definition:

Let x be the input vector, s is the hiddent state, o the output label/vector. U and V and W together with the flash represent a weight matrix applied on the data. Note that W is applied on s then goes back to s, how is that?

That’s the spirit of Recurrent. Let’s focus on the left-hand side image for the moment.

Note that the data is sequential, so let’s note x_{t-1}, x_t and x_{t+1}. Suppose that we are at time t. So we feed data sample x_{t} and obtain Ux_{t}. What happened in the hidden state? In fact, at this moment, Ux_t will not be passed to the circle and comes back to itself. We verify if there is a “stored” hidden state from the previous sample. In this case, we have the s_{t-1} issued from x_{t-1}. So we actually calculate s_t by:

s_t = f(W \cdot s_{t-1} + U \cdot x_{t})

where f is an eventual activation function. Consequently, the output o_t is

o_t = g(V \cdot s_t) = g(V\cdot f(W\cdot s_{t-1} + U \cdot x_{t}))

So that is how we understand the image on the right-hand side. It unfolds the network in sequence.

Similarly, x_t will be stored in the hidden space for the next input sample x_{t+1}. If we can understand the status of the network, then it is intuitive to extend it to x_{t-1} and x_{t+1}.

Remarks:

  • One question: how to deal with the first entry of RNN since it has no precedence? Well, in practice, for s_0, we simply initialize a zero vector to indicate that s_{-1} doesn’t exist.
  • Another remark is that the weight parameter U, V and W are shared through the network. Parameter sharing is also an important notion in CNN, in order to reduce the number of parameters for training.

Code:

Well, the goal is not only to understand, but also to practice. So I have found the following projects that seems to be interesting using a simple RNN.

https://github.com/karpathy/char-rnn Generatior of chars using RNN/LSTM/GRU

References:

https://neurohive.io/en/popular-networks/vgg16/

https://zhuanlan.zhihu.com/p/34152808

https://zybuluo.com/hanbingtao/note/541458

[Very good]http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

Leave a Reply

Your email address will not be published. Required fields are marked *