Attention layer output
See original GitHub issueThe method task_specific_attention
applies attention to the projected vectors instead of the hidden vectors (output from RNN cell).
Has it been applied purposefully or has the information on attention according to the paper been missed out where final sentence vector is weighted summation of hidden states and NOT inner projected vector?
Issue Analytics
- State:
- Created 6 years ago
- Reactions:4
- Comments:5
Top Results From Across the Web
A Beginner's Guide to Using Attention Layer in Neural Networks
A layer that can help a neural network to memorize long sequences of the information or data can be considered as the attention...
Read more >Attention layer - Keras
Dot-product attention layer, a.k.a. Luong-style attention. ... returns the attention scores (after masking and softmax) as an additional output argument.
Read more >A Brief Overview of Attention Mechanism | by Synced - Medium
Attention is simply a vector, often the outputs of dense layer using softmax function. Before Attention mechanism, translation relies on ...
Read more >Adding a Custom Attention Layer to a Recurrent Neural ...
This tutorial shows how to add a custom attention layer to a network built using a recurrent neural network. We'll illustrate an end-to-end ......
Read more >Attention (machine learning) - Wikipedia
In artificial neural networks, attention is a technique that is meant to mimic cognitive attention. The effect enhances some parts of the input...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@heisenbugfix I agree to what you mentioned in the previous comment. However, final attention is applied on hidden states and NOT projected vector.
As per my understanding, here are the steps apply attention:
I find a mismatch in 4th step in your code. Please correct me if I am wrong.
@krayush07 I think it’s more of a personal choice where to apply attention weights. In the paper, the authors project the hidden state to the same dimension and then compute attention and apply it to the hidden state. However, in this implementation he projects the hidden state to a lower dimension to compute attention. So I’m guessing he applies attention to the projected vector instead because he wants a lower dimension for the encoded sentence vector.