[GSoC2020]Attention mechanism and Transformer

Papers:

Attention is all you need
NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE
Sequence to Sequence Learning with Neural Networks
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

Posts:

The Illustrated Transformer
Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)

Very interesting visualization for attention

What Are Word Embeddings for Text?
  • What Are Word Embeddings for Text?
    • Word Embedding:
      • words with similar meaning having similar presentation (Consider the disadvantage of one-hot)
      • Distributional hypothesis: words that have similar contexts will have similar meanings.
    • Word embedding algorithms
      • Embedding layer
        • one-hot encoded words are mapped to word vectors
        • the front end of a neural network
        • training a NN to map the one-hot code to a representation
      • Word2Vec
        • Statistical method
        • tow models:
          • Continuous Bag-of-Words, CBOW model: learn the embedding by predicting the current word based on its context
          • Continuous Skip -Gram model: learn by predicting the surrounding words given a current word.
          • Note that the context is a window of neighboring words
        • #TODO: Read more. Efficient Estimation of Word Representations in Vector Space
      • GloVe
        • An extension of Word2Vec
        • Integrate global statistics of matrix factorization techniques like Latent Semantic Analysis(LSA) with the local context-based learning in Word2Vec.
        • No window, but a word-context or word co-occurrence matrix using statistics across the whole text corpus.

Attention机制详解(二)——Self-Attention与Transformer
【NLP】Attention Model(注意力模型)学习总结
Query-Key-Value model
Attention model 学习总结

Code:

https://github.com/Choco31415/Attention_Network_With_Keras

Leave a Reply

Your email address will not be published. Required fields are marked *