How to add Attention on top of a Recurrent Layer (Text Classification)
See original GitHub issueI am doing text classification. Also I am using my pre-trained word embeddings and i have a LSTM
layer on top with a softmax
at the end.
vocab_size = embeddings.shape[0]
embedding_size = embeddings.shape[1]
model = Sequential()
model.add(Embedding(
input_dim=vocab_size,
output_dim=embedding_size,
input_length=max_length,
trainable=False,
mask_zero=True,
weights=[embeddings]
))
model.add(LSTM(200, return_sequences=False))
model.add(Dropout(0.5))
model.add(Dense(3, activation='softmax', activity_regularizer=activity_l2(0.0001)))
Pretty simple. Now I want to add attention to the model, but i don’t know how to do it.
My understanding is that i have to set return_sequences=True
so as the attention layer will weigh each timestep accordingly. This way the LSTM will return a 3D Tensor, right?
After that what do i have to do?
Is there a way to easily implement a model with attention using Keras Layers or do i have to write my own custom layer?
If this can be done with the available Keras Layers, I would really appreciate an example.
Issue Analytics
- State:
- Created 7 years ago
- Reactions:42
- Comments:116 (20 by maintainers)
Top Results From Across the Web
Adding a Custom Attention Layer to a Recurrent Neural ...
This tutorial shows how to add a custom attention layer to a network built using a recurrent neural network. We'll illustrate an end-to-end ......
Read more >Text Classification, Part 2 - sentence level Attentional RNN
In the second post, I will try to tackle the problem by using recurrent neural network and attention based LSTM encoder.
Read more >Text Classification using Attention Mechanism in Keras
In this tutorial, We build text classification models in Keras that use attention mechanism to provide insight ... Create Attention Layer.
Read more >Attention Mechanisms With Keras - Paperspace Blog
This tutorial covers what attention mechanisms are, different types of attention mechanisms, and how to implement an attention mechanism with Keras.
Read more >Write your own custom Attention layer: Easy, intuitive guide
(2016) introduce an attention mechanism that takes two sentences and outputs a single vector. Another take on this is Attention-over-Attention — ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@baziotis This area is supposed to be more for bugs as opposed to “how to implement” questions. I admit I don’t often look at the google group, but that is a valid place to ask these questions, as well as on the Slack channel.
Bengio et. al has a pretty good paper on attention (soft attention is the softmax attention).
An example of method a) I described:
example b), with simple activation:
example b) with sigmoid and then softmax (non-working, but the idea):
In addition, I should say that my notes about whether a) or b) above is what you probably need are based on your example, where you want one output (making option b probably the correct way). Attention is often used in spaces like caption generation where there is more than 1 output such as setting
return_sequences=True
. For those cases, I think that option a) is the described usage, such that the recurrency keeps all the information passing forward, and it’s just the higher layers that utilize the attention.@patyork, I’m sorry, but I don’t see how this implements attention at all?
From my understanding, the softmax in the Bengio et al. paper is not applied over the LSTM output, but over the output of an attention model, which is calculated from the LSTM’s hidden state at a given timestep. The output of the softmax is then used to modify the LSTM’s internal state. Essentially, attention is something that happens within an LSTM since it is both based on and modifies its internal states.
I actually made my own attempt to create an attentional LSTM in Keras, based on the very same paper you cited, which I’ve shared here:
https://gist.github.com/mbollmann/ccc735366221e4dba9f89d2aab86da1e
There are several different ways to incorporate attention into an LSTM, and I won’t claim 100% correctness of my implementation (though I’d appreciate any hints if something seems terribly wrong!), but I’d be surprised if it was as simple as adding a softmax activation.