question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Vocab() is broken: getting errors when providing keyword arguments to function: `__init__() got an unexpected keyword argument 'min_freq'`

See original GitHub issue

🐛 Bug

Describe the bug I was working through the migration notebook to understand the new API. The Vocab() function seems very broken, at least in google colab? I am getting an error when I try and create a Vocab() with the min_freq=10 setting. But even after I removed this setting, I am getting an error TypeError: __init__() got an unexpected keyword argument 'specials'. So this suggests that Vocab is not recognizing any of the keyword arguments mentioned in the API. I was using Torchtext 0.11.0 with Pytorch 1.10.0+cu111 in a google colab notebook.

To Reproduce

  1. start a google colab notebook.
  2. follow the migration tutorial in the torchtext repo.
  3. Enter the following line and this generates the error.
from collections import Counter
from torchtext.vocab import Vocab

train_iter = IMDB(split='train')
counter = Counter()
for (label, line) in train_iter:
    counter.update(tokenizer(line))
vocab = Vocab(counter, min_freq=10, specials=('<unk>', '<BOS>', '<EOS>', '<PAD>'))

The following error and stacktrace is generated.

TypeError                                 Traceback (most recent call last)

<ipython-input-8-e5262609a934> in <module>()
      6 for (label, line) in train_iter:
      7     counter.update(tokenizer(line))
----> 8 vocab = Vocab(counter, min_freq=1, specials=('<unk>', '<BOS>', '<EOS>', '<PAD>'))

TypeError: __init__() got an unexpected keyword argument 'min_freq'

But even after removing the min_freq=1, I still get the error.

train_iter = IMDB(split='train')
counter = Counter()
for (label, line) in train_iter:
    counter.update(tokenizer(line))
vocab = Vocab(counter, specials=('<unk>', '<BOS>', '<EOS>', '<PAD>'))

I get the error message:

TypeError                                 Traceback (most recent call last)

<ipython-input-13-39009faace9c> in <module>()
      6 for (label, line) in train_iter:
      7     counter.update(tokenizer(line))
----> 8 vocab = Vocab(counter, specials=('<unk>', '<BOS>', '<EOS>', '<PAD>'))

TypeError: __init__() got an unexpected keyword argument 'specials'

Expected behavior This code should generate a Vocabulary with only words that occur a minimum of 10 times.

Screenshots If applicable, add screenshots to help explain your problem.

Environment

Please copy and paste the output from our environment collection script (or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
python -c "import torchtext; print(\"torchtext version is \", torchtext.__version__)"
  • PyTorch Version (e.g., 1.0): 1.10
  • OS (e.g., Linux): Google Colab notebook with gpu.
  • How you installed PyTorch (conda, pip, source): pytorch was already installed
  • Build command you used (if compiling from source): NA
  • Python version: 3.7.2
  • CUDA/cuDNN version: unknown
  • GPU models and configuration: unknown–whatever provided by colab at the time.
  • Any other relevant information:

Additional context Add any other context about the problem here.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
parmeetcommented, Nov 17, 2021

I would strongly recommend to follow version >=0.10.0.

Yes, that’s our goal to standardize around pytorch dataloaders and datasets, which is why we also deprecated our legacy data abstractions APIs and created torchtext datasets using Iterable datasets. Please do follow the developments on main branch where we are also adding support for Model APIs in upcoming releases. Thank you for your feedback 😃

1reaction
00krishnacommented, Nov 17, 2021

@parmeet Ahhh okay so that makes sense. Thanks for pointing out the docs–I was confused about the difference I encountered between 0.9.0 and 0.11.0, but now that make sense as part of the design. So the tutorial is only valid for 0.9.0 and then look at the other docs for 0.10.0. That make sense.

So just to clarify, what is the best strategy for now? Like should I stabilize around 0.9.0, or is it better to follow 0.10.0? I was not sure which API is the most stable one for now.

I actually really like how the new version is using the standard pytorch DataLoaders 😃.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Vocab.__init__() got an unexpected keyword argument ...
I am working on a CNN Sentiment analysis machine learning model which uses the IMDb dataset provided by the Torchtext library.
Read more >
Error init got an unexpected keyword argument user - Edureka
I am using Django to create a user and an object when the user is created. But there is ... .objects.create(user=instance). How to...
Read more >
Accepting arbitrary keyword arguments in Python
Let's make a function that accepts arbitrary keyword arguments. Calling with arbitrary keyword arguments. We're going to make a function ...
Read more >
Function arguments - Manual - PHP
A function may define default values for arguments using syntax similar to assigning a variable. The default is used only when the parameter...
Read more >
Glossary — Python 3.11.1 documentation
keyword argument : an argument preceded by an identifier (e.g. name= ) in a function call or passed as a value in a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found