Location sensitive attention shouldn't use preprocessed memory?
See original GitHub issueKeith,
I see that you’re also using the keys
(preprocessed memory) to compute the location sensitive attention scores. Shouldn’t it be just the processed query and the processed location?
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (2 by maintainers)
Top Results From Across the Web
Location Sensitive Attention Explained | Papers With Code
Location Sensitive Attention is an attention mechanism that extends the additive attention mechanism to use cumulative attention weights from previous decoder ...
Read more >Layer-specificity in the effects of attention and working ... - NCBI
Neuronal activity in early visual cortex depends on attention shifts but the contribution to working memory has remained unclear.
Read more >Selecting for Memory? The Influence of Selective Attention on ...
Behavioral studies demonstrate that attention influences the strength and content of memory (Chun and Turk-Browne, 2007), but the neural ...
Read more >Modulating the Focus of Attention for Spoken Words ... - Hindawi
Attention is crucial for encoding information into memory, and current dual-process models seek to explain the roles of attention in both recollection ...
Read more >Characterizing Focused Attention and Working Memory Using ...
The analysis conducted in this paper is based on the latter principle, where electric signals are measured from different locations on the scalp...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes – sorry for the confusion. I’ll update the naming and comments to be more clear about this.
This is really nice code. Thanks very much.
So, if I can clarify this (advance apologies if it is stating the obvious …) fi = F ∗ αi−1. (8): This corresponds to
f = self.location_conv(expanded_alignments) # [N, T_in, 10]
With ‘k’ vectors being the number of filters, and ‘r’ being the filter size.
The second operation corresponding to equation (9): ei,j = vT tanh(W si−1 + V hj + Ufi,j + b) (9)
Does that correspond to Uf in the code below?
processed_location = self.location_layer(f) # [N, T_in, num_units]
With equation (9) being computed as
return tf.reduce_sum(v * tf.tanh(keys + processed_query + processed_location), [2])
The paper says “First, we extract k vectors fi,j ∈ R k for every position j of the previous alignment αi−1 by convolving it with a matrix F ∈ R k×r”.
I am assuming that we arbitrarily choose the size of the vectors == hidden dim, as the paper does not specify what this size should be, only saying that we need ‘k’ such vectors. :