question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using different encoders in CLIP

See original GitHub issue

Hi, I am wondering if it was possible to use different encoders in CLIP ? For images not using vit but resnet for example. And is it possible to replace the text encoder by a features encoder for example ? If I have a vector of features for a given image and I want to use x-clip how should I do that ? I have made a code example that doesnt seems to work, here is what I did:

import torch
from x_clip import CLIP
import torch.nn as nn
from torchvision import models

class Image_Encoder(torch.nn.Module):
    #output size is (bs,512)
    def __init__(self):
        super(Image_Encoder, self).__init__()
        self.model_pre = models.resnet18(pretrained=False)
        self.base=nn.Sequential(*list(self.model_pre.children()))
        self.base[0]=nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
        self.resnet=self.base[:-1]

    def forward(self, x):
        out=self.resnet(x).squeeze()
        return out


class features_encoder(torch.nn.Module):
    #output size is (bs,512)
    def __init__(self):
        super(features_encoder, self).__init__()
        self.model =nn.Linear(2048,512)

    def forward(self, x):
        out=self.model(x)
        return out

images_encoder=Image_Encoder()
features_encoder=features_encoder()

clip = CLIP(
    image_encoder = images_encoder,
    text_encoder = features_encoder,
    dim_image = 512,
    dim_text = 512,
    dim_latent = 512
)

features= torch.randn(4,2048)
images = torch.randn(4, 3, 256, 256)

loss = clip(features, images, return_loss = True)
loss.backward()

but I got the following error : forward() takes 2 positional arguments but 3 were given

Thanks

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
ethancohen123commented, Aug 8, 2022

works just fine thanks !

1reaction
ethancohen123commented, Aug 5, 2022

works just fine thanks 😃 Although now visual ssl set to True returns the following error :
EinopsError: Error while processing rearrange-reduction pattern “b n d -> (b n) d”. Input tensor shape: torch.Size([2, 512]). Additional info: {}. Expected 3 dimensions, got 2

Sorry about the trouble aha

Read more comments on GitHub >

github_iconTop Results From Across the Web

CLIP - Hugging Face
CLIP is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image classification. CLIP uses a...
Read more >
How to Train your CLIP | by Federico Bianchi | Medium
Encoding. The assumption behind CLIP is very simple: you need to have an image encoder and a text encoder. Each of these will...
Read more >
A Beginner's Guide to the CLIP Model - KDnuggets
The CLIP model is no different: the text encoder and image encoder are fit to maximize goodness and minimize badness.
Read more >
CLIP: Connecting Text and Images - OpenAI
CLIP pre-trains an image encoder and a text encoder to predict which images were paired with which texts in our dataset. We then...
Read more >
How Much Can CLIP Benefit Vision-and-Language Tasks?
To further study the advantage brought by CLIP, we propose to use CLIP as the visual encoder in various V&L models in two...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found