Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cache build_vocab; Shared vocabulary

See original GitHub issue

src.build_vocab(mt_train, max_size=80000) trg.build_vocab(mt_train, max_size=40000)

In the README example, it looks like build_vocab is used twice on the same dataset. For large datasets this could take awhile.

Issue Analytics

State:
Created 6 years ago
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

jekbradburycommented, Jul 9, 2017

The two calls iterate through entirely separate data: the semantics of field.build_vocab(dataset) are to build the vocab for the field from every column in the provided dataset that is associated with that field.

0reactions

joecummingscommented, Sep 7, 2022

Closing as stale.

Read more comments on GitHub >

Top Results From Across the Web

torchtext.vocab - PyTorch

Initializes internal Module state, shared by both nn. ... iterator – Iterator used to build Vocab. ... cache – directory for cached vectors....

How to pass new pre-trained embeddings while sharing the ...

I'm afraid the only way is to loop through vectors_imdb keeping only words that are in my vocab, sorting them so that the...

Gensim: share vocabulary across models | Luca Papariello blog

A brief illustration of how to share a common vocabulary among different Gensim models.

torchtext.vocab - Read the Docs

Defines a vocabulary object that will be used to numericalize a field. ... Tensor.zero_; vectors_cache – directory for cached vectors.

Can I use a different corpus for fasttext build_vocab than train ...

The build_vocab() call establishes the known vocabulary of the model, & caches some stats about the corpus. If you then supply another corpus...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

[Discussion] Saving the field object

How to use pytorch text in the projects