question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Attention layer output

See original GitHub issue

The method task_specific_attention applies attention to the projected vectors instead of the hidden vectors (output from RNN cell).

Has it been applied purposefully or has the information on attention according to the paper been missed out where final sentence vector is weighted summation of hidden states and NOT inner projected vector?

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:4
  • Comments:5

github_iconTop GitHub Comments

1reaction
krayush07commented, Apr 20, 2018

@heisenbugfix I agree to what you mentioned in the previous comment. However, final attention is applied on hidden states and NOT projected vector.

As per my understanding, here are the steps apply attention:

  1. Collect ‘hidden_states’
  2. Apply projection to get projected vector.
  3. Use projected vector and attention vector to find attention weights.
  4. Use attention weights and hidden_states to apply attention.

I find a mismatch in 4th step in your code. Please correct me if I am wrong.

0reactions
momihcommented, Jun 11, 2018

@krayush07 I think it’s more of a personal choice where to apply attention weights. In the paper, the authors project the hidden state to the same dimension and then compute attention and apply it to the hidden state. However, in this implementation he projects the hidden state to a lower dimension to compute attention. So I’m guessing he applies attention to the projected vector instead because he wants a lower dimension for the encoded sentence vector.

Read more comments on GitHub >

github_iconTop Results From Across the Web

A Beginner's Guide to Using Attention Layer in Neural Networks
A layer that can help a neural network to memorize long sequences of the information or data can be considered as the attention...
Read more >
Attention layer - Keras
Dot-product attention layer, a.k.a. Luong-style attention. ... returns the attention scores (after masking and softmax) as an additional output argument.
Read more >
A Brief Overview of Attention Mechanism | by Synced - Medium
Attention is simply a vector, often the outputs of dense layer using softmax function. Before Attention mechanism, translation relies on ...
Read more >
Adding a Custom Attention Layer to a Recurrent Neural ...
This tutorial shows how to add a custom attention layer to a network built using a recurrent neural network. We'll illustrate an end-to-end ......
Read more >
Attention (machine learning) - Wikipedia
In artificial neural networks, attention is a technique that is meant to mimic cognitive attention. The effect enhances some parts of the input...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found