About the Duplex attention
See original GitHub issueHi, Thanks for sharing the code!
I have a few questions about Section 3.1.2. Duplex attention.
-
I am confused by the notation in the section. For example, in this section, “
Y=(K^{P\times d}, V^{P\times d})
, where the values store the content of the Y variables (e.g. the randomly sampled latents for the case of GAN)”. Does it mean that V^{P\times d} is sampled from the original variable Y? how to set the number of P in your code? -
“keys track the centroids of the attention-based assignments from X to Y, which can be computed as
K=a_b(Y, X)
”, does it mean K is calculated by using the self-attention module but with (Y, X) as input? If so, how to understand “the keys track the centroid of the attention-based assignments from X to Y”? BTW, how to get the centroids? -
For the update rule in duplex attention, what does the
a()
function mean? Does it denote a self-attention module likea_b()
in Section 3.1.1, where X as query, K as keys, and V as values, if so, K is calculated from another self-attention module as mentioned in question 2, so the output ofa_b(Y, X)
will be treated as Keys, so the update rule contains two self-attention operations? is that right? Does it mean ’Duplex‘ attention? -
But finally I find I may be wrong when I read the last paragraph in this section. As mentioned in this section, “to support bidirectional interaction between elements, we can chain two reciprocal simplex attentions from X to Y and from Y to X, obtaining the duplex attention” So, does it mean, first, we calculate the Y by using a simplex attention module
u^a(Y, X)
, and then use this Y as input ofu^d(X, Y)
to update X? Does it mean the duplex attention module contains three self-attention operations?
Thanks a lot! 😃
Issue Analytics
- State:
- Created 2 years ago
- Reactions:4
- Comments:7 (3 by maintainers)
Thanks a lot for your detailed reply! Now I understand the core idea of the duplex attention part.
Thank you! 😃
Hi all! @07hyx06 Yep that’s correct! We first find the centroids by casting attention over the image features (x) and then update the features based on the centroids (K) @nicolas-dufour that’s right the values are not iteratively updated, only the centroids and the image features! @subminu Thanks so much for pointing that out! I’ll update the paper with that fix!