Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Make the `CLIPTokenizer`'s `encoder_json_path` variable optional, and use `dict(zip(vocab, range(len(vocab))))` instead

See original GitHub issue

🚀 Feature

https://github.com/pytorch/text/blob/main/torchtext/transforms.py#L312

In both CLIP and OpenCLIP, the encoder is simply just the vocab run through dict(zip(vocab, range(len(vocab)))), and it doesn’t make a ton of sense to require a encoder.json file for this information. The encoder.json file requirement should be optional as the vocab file itself can be used to create the encoder, making the encoder file redundant.

https://github.com/openai/CLIP/blob/main/clip/simple_tokenizer.py#L74 https://github.com/mlfoundations/open_clip/blob/main/src/clip/tokenizer.py#L78

The current clip_encoder.json test asset is just the Python dict created by dict(zip(vocab, range(len(vocab)))), and thus will it makes a useful test for specifying an encoder, it’s redundant: https://github.com/pytorch/text/blob/main/test/asset/clip_encoder.json

Issue Analytics

State:
Created 2 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

2reactions

parmeetcommented, Feb 15, 2022

cc: @abhinavarora

1reaction

abhinavaroracommented, Feb 18, 2022

@abhinavarora Yeah, that makes sense and sounds good to me!

@ProGamerGov Here is PR: #1622

Read more comments on GitHub >

Top Results From Across the Web

Counting Word Frequencies with Python

Python has an easy way to count frequencies, but it requires the use of a new type of variable: the dictionary. Before you...

Counting word frequency and making a dictionary from it

One way is use Counter as @Michael suggested, but to use your approach in which you want to start from empty an dict....

course_2_assessment_3.py - Github-Gist

The key is the course name and the value is the number of credits. Find the total number of credits taken this semester...

Bag of Words: Approach, Python Code, Limitations

Bag of Words is a simplified feature extraction method for text data that is easy to implement. It involves maintaining a vocabulary and ......

Vocabulary.com Dictionary - Meanings, Definitions, Quizzes ...

Vocabulary.com is the world's best dictionary for English definitions, synonyms, quizzes, ... Get Word of the Day delivered straight to your inbox!

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

IWSLT2017 Download Issue in torchtext v0.11.0

getitem() not implemented?