Since my mentor and I are trying to redirect the project to an Audio-Visual oriented research project. I would need to pick up the basis of RNN and dive into more sophisticated models. Therefore, I think it might be good to write down the things that I read. The first is quite basic, the Recurrent Neural Network.
Why do we need Recurrent Neural Network(RNN)?
As the name suggests, the data is used recurrently in the model. The basic idea is to enable the network to take into account the information in sequence. For instance, in a speech recognition task, if we know that the previous word is ‘Make’ then it is preferable for the model to chose a noun, e.g. Make America Great Again(no offense), instead of another verb.
However, most feed-forward networks process the data independently. A typical CNN treats each image without thinking about the next input image. We have to use 3D CNN to involve the information in sequence, which is computationally expensive. How can we do to enable the network to think in a context? Here comes the idea of RNN.
What is RNN?
I think the following figure is widely presented when someone wants to illustrate the structure of the network.
It takes me some time to understand the basic idea. Then I realize how compact and informative the left figure is!
Let be the input vector, is the hiddent state, the output label/vector. and and together with the flash represent a weight matrix applied on the data. Note that is applied on then goes back to , how is that?
That’s the spirit of Recurrent. Let’s focus on the left-hand side image for the moment.
Note that the data is sequential, so let’s note , and . Suppose that we are at time . So we feed data sample and obtain . What happened in the hidden state? In fact, at this moment, will not be passed to the circle and comes back to itself. We verify if there is a “stored” hidden state from the previous sample. In this case, we have the issued from . So we actually calculate by:
where is an eventual activation function. Consequently, the output is
So that is how we understand the image on the right-hand side. It unfolds the network in sequence.
Similarly, will be stored in the hidden space for the next input sample . If we can understand the status of the network, then it is intuitive to extend it to and .
- One question: how to deal with the first entry of RNN since it has no precedence? Well, in practice, for , we simply initialize a zero vector to indicate that doesn’t exist.
- Another remark is that the weight parameter , and are shared through the network. Parameter sharing is also an important notion in CNN, in order to reduce the number of parameters for training.
Well, the goal is not only to understand, but also to practice. So I have found the following projects that seems to be interesting using a simple RNN.
https://github.com/karpathy/char-rnn Generatior of chars using RNN/LSTM/GRU