Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature extraction in torchvision.models.vit_b_16

See original GitHub issue

🐛 Describe the bug

It’s easy enough to obtain output features from the CNNs in torchvision.models by doing this:

import torch
import torch.nn as nn
import torchvision.models as models

model = models.resnet18()
feature_extractor = nn.Sequential(*list(model.children())[:-1])
output_features = feature_extractor(torch.randn(1, 3, 224, 224))

However, when I attempt to do this with torchvision.models.vit_b_16:

import torch
import torch.nn as nn
import torchvision.models as models

model = models.vit_b_16()
feature_extractor = nn.Sequential(*list(model.children())[:-1])
output_features = feature_extractor(torch.randn(1, 3, 224, 224))

I get the following error:

AssertionError: Expected (batch_size, seq_length, hidden_dim) got torch.Size([1, 768, 14, 14])

Any help would be greatly appreciated.

Versions

Torch version: 1.11.0+cu102 Torchvision version: 0.12.0+cu102

cc @datumbox

Issue Analytics

State:
Created a year ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

alexander-soarecommented, Apr 1, 2022

@datumbox before working on FX I was trying to structure things that way in some timm models and found myself making weird modules that only made sense in a particular context. For instance, in the VIT example you would end up wrapping the final slice operation in a module… I also tried the approach of absorbing things into modules (like absorbing the slice operation into the encoder module). Sometimes this works, sometimes you find yourself changing the module name and docstring to try to make the semantics appropriate to the new inputs/outputs of the module. A lot of the time, the detour just doesn’t feel worth it.

If the FX solution didn’t exist, I’d probably be in agreement that that’s the way to go. But with FX, problem solved no?

1reaction

datumboxcommented, Apr 1, 2022

@DavidTorpey thanks a lot for reporting!

@alexander-soare I wonder if you have the bandwidth to have a look?

Top Results From Across the Web

Feature extraction for model inspection - PyTorch

The torchvision.models.feature_extraction package contains feature extraction utilities that let us tap into our models to access intermediate ...

How to Extract Features — VISSL 0.1.6 documentation

Given a pre-trained models, VISSL makes it easy to extract the features for the model on the datasets. VISSL seamlessly supports TorchVision models....

Vision Transformer (ViT) - Hugging Face

BEiT models outperform supervised pre-trained vision transformers using a self-supervised method inspired by BERT (masked image modeling) and based on a VQ-VAE.

Feature Extraction - Pytorch Image Models - GitHub Pages

Feature Extraction. All of the models in timm have consistent mechanisms for obtaining various types of features from the model for tasks besides ......

Feature Extraction With TorchVision's Newest Utility - YouTube

In this video I walk you through how to use Torchvision's new feature extraction utility. Questions welcome in the comments!