Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Attention layer output

See original GitHub issue

The method task_specific_attention applies attention to the projected vectors instead of the hidden vectors (output from RNN cell).

Has it been applied purposefully or has the information on attention according to the paper been missed out where final sentence vector is weighted summation of hidden states and NOT inner projected vector?

Issue Analytics

State:
Created 6 years ago
Reactions:4
Comments:5

Top GitHub Comments

1reaction

krayush07commented, Apr 20, 2018

@heisenbugfix I agree to what you mentioned in the previous comment. However, final attention is applied on hidden states and NOT projected vector.

As per my understanding, here are the steps apply attention:

Collect ‘hidden_states’
Apply projection to get projected vector.
Use projected vector and attention vector to find attention weights.
Use attention weights and hidden_states to apply attention.

I find a mismatch in 4th step in your code. Please correct me if I am wrong.

0reactions

momihcommented, Jun 11, 2018

@krayush07 I think it’s more of a personal choice where to apply attention weights. In the paper, the authors project the hidden state to the same dimension and then compute attention and apply it to the hidden state. However, in this implementation he projects the hidden state to a lower dimension to compute attention. So I’m guessing he applies attention to the projected vector instead because he wants a lower dimension for the encoded sentence vector.

Top Results From Across the Web

A Beginner's Guide to Using Attention Layer in Neural Networks

A layer that can help a neural network to memorize long sequences of the information or data can be considered as the attention...

Attention layer - Keras

Dot-product attention layer, a.k.a. Luong-style attention. ... returns the attention scores (after masking and softmax) as an additional output argument.

A Brief Overview of Attention Mechanism | by Synced - Medium

Attention is simply a vector, often the outputs of dense layer using softmax function. Before Attention mechanism, translation relies on ...

Adding a Custom Attention Layer to a Recurrent Neural ...

This tutorial shows how to add a custom attention layer to a network built using a recurrent neural network. We'll illustrate an end-to-end ......

Attention (machine learning) - Wikipedia

In artificial neural networks, attention is a technique that is meant to mimic cognitive attention. The effect enhances some parts of the input...