question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`kxd` matrix or `1xd` vector?

See original GitHub issue

In section 3 of paper ‘Augmenting Convolutional networks with attention-based aggregation’: ··· We can easily specialize the attention maps per class by replacing the CLS vector with a k × d matrix, where each of the k columns is associated with one of the classes. This specialization allows us to visualize an attention map for each class, as shown in Figure 2. ··· But I only found 1 x d vector. Where is k x d matrix?

https://github.com/facebookresearch/deit/blob/40ae72b79cc5cd48dac2b02e1fceb03ee4192676/patchconvnet_models.py#L201

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
jegoucommented, Jan 1, 2022

This design is introduced as a straightforward variation of our learned aggregation layer. And yes: mostly for vizualization.

In Figure 1 most of our results are with 1 token. I am sorry if you think you find we have spend ‘much time’, I actually thought that we were presenting it as a straightforward variation of our main proposal: only the sentence ‘We can specialize this attention per class’ in Figure 1 for the last column, and a small paragraph in Section 3, where we point out some limitations of this variant. Maybe the confusing part is Fig. 3, where we could easily explain the tiny difference between both choices.

So yes, our paper has a significant focus on how to provide vizualisation and some interpretable aggregation layer to convnets, this is also true for the single-token version.

The choice of the single token is driven by a more direct relationship to this objective, as it provides the weight of each patch in the aggregation. The one-class-token provides a weight per patch for each class, independent of whether the class is selected or not by the classifier. While both have their pros and cons, we feel like they are complementary for vizualisation purposes.

Hervé

0reactions
densechencommented, Jan 8, 2022

@TouvronHugo Thanks for your kindly reply. Your detailed explanation makes me know better about the whole process. Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

k-means-mog
K-means is inappropriate when either data given as a proximity matrix or an average over data-points xn is not meaningful way to calculate...
Read more >
OpenCV: append different vectors as one row edit
I have a cv::Mat1f vector which size is kxd . How can I fill it by appending k different 1xd vectors? I want...
Read more >
Part 1: Vectors and matrix basics
In this series of posts we describe some basic ideas regarding vectors and matrices (also called 'arrays') that are fundamental to understanding machine ......
Read more >
Convert 1D array in to row or column vector in Numpy
This formula remains valid for vectors if we assume that the row vector is a matrix of dimension (1, N), and the column...
Read more >
12.1.1 Matrices and Vectors Definition of Matrix. An MxN ...
An MxN matrix A is a two-dimensional array of numbers ... A column vector is a Bx1 matrix and a row vector is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found