Problem related to encoding text
See original GitHub issueI am trying to use a resnet50 model that I created with this repo, but I can’t encode text.
with torch.no_grad():
    tmp = clip.tokenize("test")
    tmp = tmp.to(device)
    print(tmp)
    print(tmp.shape)
    text_encoded = model.model.encode_text(tmp)
tensor([[49406,  1628, 49407,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0]], device='cuda:0')
torch.Size([1, 77])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-68003eb3bebb> in <module>()
      9     print(tmp)
     10     print(tmp.shape)
---> 11     text_encoded = model.model.encode_text(tmp)
     12 
2 frames
/content/train-CLIP/models/model.py in encode_text(self, text)
    343         x = x + self.positional_embedding.type(self.dtype)
    344         x = x.permute(1, 0, 2)  # NLD -> LND
--> 345         x = self.transformer(x)
    346         x = x.permute(1, 0, 2)  # LND -> NLD
    347         x = self.ln_final(x).type(self.dtype)
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []
/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    937         elif input_ids is not None:
    938             input_shape = input_ids.size()
--> 939             batch_size, seq_length = input_shape
    940         elif inputs_embeds is not None:
    941             input_shape = inputs_embeds.size()[:-1]
ValueError: too many values to unpack (expected 2)
Printing x before self.transformer(x) results in torch.Size([77, 1, 512]).
The input shape torch.Size([1, 77]) does match the original clip code and the model loaded with clip seems to work without major problems.
import torch
import clip
from PIL import Image
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device, jit=False)
image = preprocess(Image.open("/test.png")).unsqueeze(0).to(device)
text = clip.tokenize(["test"]).to(device)
print(text)
print(text.shape)
with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()
tensor([[49406,  1628, 49407,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0]], device='cuda:0')
torch.Size([1, 77])
Not sure what I am doing wrong, since encoding images does seem to work fine with this repo.
with torch.no_grad():
    photos_features = model.model.encode_image(image)
    photos_features /= photos_features.norm(dim=-1, keepdim=True)
print(photos_features.shape)
torch.Size([1, 768])
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (4 by maintainers)
 Top Results From Across the Web
Top Results From Across the Web
Solving character encoding problems - jAlbum Wiki
Problem. Computers store text as a sequence of numbers where each character has a unique number according to an agreed upon "character encoding ......
Read more >How to solve unicode encoding issues - Invivoo
In ths new article, our expert will explain you how to solve unicode encoding issues. If you have any question, don't hesite to...
Read more >Encodings and Problems | Text - CS251
In this chapter, we introduce you to the formal definitions related to strings and encodings of objects ... A set that can be...
Read more >Text encoding: the good, the bad, and the ugly.
Problem 1: Guessing the encoding Care to guess which one your random, out of range character belongs to? As there are literally hundreds...
Read more >Text Encoding: A Review - Towards Data Science
The main disadvantage of the Index-Based Encoding is that it introduces a numerical distance between texts that doesn't really exist.
Read more > Top Related Medium Post
Top Related Medium Post
No results found
 Top Related StackOverflow Question
Top Related StackOverflow Question
No results found
 Troubleshoot Live Code
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free Top Related Reddit Thread
Top Related Reddit Thread
No results found
 Top Related Hackernoon Post
Top Related Hackernoon Post
No results found
 Top Related Tweet
Top Related Tweet
No results found
 Top Related Dev.to Post
Top Related Dev.to Post
No results found
 Top Related Hashnode Post
Top Related Hashnode Post
No results found

Happy to help!
I did not notice that there are 2 functions called
encode_textand assumed there is only one.It seems to work, thank you.