Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problem related to encoding text

See original GitHub issue

I am trying to use a resnet50 model that I created with this repo, but I can’t encode text.

with torch.no_grad():
    tmp = clip.tokenize("test")
    tmp = tmp.to(device)
    print(tmp)
    print(tmp.shape)
    text_encoded = model.model.encode_text(tmp)

tensor([[49406,  1628, 49407,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0]], device='cuda:0')
torch.Size([1, 77])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-68003eb3bebb> in <module>()
      9     print(tmp)
     10     print(tmp.shape)
---> 11     text_encoded = model.model.encode_text(tmp)
     12 

2 frames
/content/train-CLIP/models/model.py in encode_text(self, text)
    343         x = x + self.positional_embedding.type(self.dtype)
    344         x = x.permute(1, 0, 2)  # NLD -> LND
--> 345         x = self.transformer(x)
    346         x = x.permute(1, 0, 2)  # LND -> NLD
    347         x = self.ln_final(x).type(self.dtype)

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    937         elif input_ids is not None:
    938             input_shape = input_ids.size()
--> 939             batch_size, seq_length = input_shape
    940         elif inputs_embeds is not None:
    941             input_shape = inputs_embeds.size()[:-1]

ValueError: too many values to unpack (expected 2)

Printing x before self.transformer(x) results in torch.Size([77, 1, 512]).

The input shape torch.Size([1, 77]) does match the original clip code and the model loaded with clip seems to work without major problems.

import torch
import clip
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device, jit=False)

image = preprocess(Image.open("/test.png")).unsqueeze(0).to(device)
text = clip.tokenize(["test"]).to(device)
print(text)
print(text.shape)

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

tensor([[49406,  1628, 49407,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0]], device='cuda:0')
torch.Size([1, 77])

Not sure what I am doing wrong, since encoding images does seem to work fine with this repo.

with torch.no_grad():
    photos_features = model.model.encode_image(image)
    photos_features /= photos_features.norm(dim=-1, keepdim=True)

print(photos_features.shape)

torch.Size([1, 768])

Issue Analytics

State:
Created 2 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

Zasder3commented, Jul 10, 2021

Happy to help!

0reactions

styler00dollarcommented, Jul 10, 2021

I did not notice that there are 2 functions called encode_text and assumed there is only one.

with torch.no_grad():
    encoded_text = tokenizer(["test"], return_tensors='pt').to(device)
    result = model.encode_text(encoded_text)
    print(result)

tensor([[-7.9948e-01,  3.2338e-01,  1.7573e-01, -4.5223e-01, -2.1422e-01,
          3.6682e-02, -8.9392e-02, -1.0695e+00, -3.5576e-01,  1.2232e+00,
...

It seems to work, thank you.

Top Results From Across the Web

Solving character encoding problems - jAlbum Wiki

Problem. Computers store text as a sequence of numbers where each character has a unique number according to an agreed upon "character encoding ......

How to solve unicode encoding issues - Invivoo

In ths new article, our expert will explain you how to solve unicode encoding issues. If you have any question, don't hesite to...

Encodings and Problems | Text - CS251

In this chapter, we introduce you to the formal definitions related to strings and encodings of objects ... A set that can be...

Text encoding: the good, the bad, and the ugly.

Problem 1: Guessing the encoding Care to guess which one your random, out of range character belongs to? As there are literally hundreds...

Text Encoding: A Review - Towards Data Science

The main disadvantage of the Index-Based Encoding is that it introduces a numerical distance between texts that doesn't really exist.