question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to share vocabulary across fields?

See original GitHub issue

I’m new to torchtext. I want to use two fields which should have the same vocabulary. The only difference is that field2 prepends an <sos> token to every sequence. I have the following code:

field1 = ReversibleField(torch.LongTensor, tokenize=tokenizer)
field2 = ReversibleField(torch.LongTensor, tokenize=tokenizer, init_token='<sos>')
dataset = TabularDataset(path='train.json', format='json',
                         fields={'x': ('x', field1),  'y': ('y', field2)})
field1.build_vocab(dataset, max_size=30000)

Now I want field2 to use the vocab of field1. I tried field2.vocab = field1.vocab, but this results in an error in later processing. According to documentation, the only way to force the same vocabulary seems to be to use the same field, but setting the init_token dynamically isn’t possible when the dataset is read by a BucketIterator.

My current workaround is to first save field1, and then load it also as field2:

field1 = ReversibleField(torch.LongTensor, tokenize=tokenizer, init_token='<sos>')
# ... build vocab as before
torch.save(field1, 'field1.pt')
field2 = torch.load('field1.pt')
field1.init_token = None

However, I don’t know if this will also work if the fields are associated with word vectors. Since the vocab of field1 and field2 are essentially two independent copies, updates to the word embeddings will have to be performed on both fields. Alternatively, I can manually create a vocabulary dict and use it to initialize two copies of Vocab, but how can I make the two fields to use them?

Is there a recommended way to share vocabulary?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:11 (3 by maintainers)

github_iconTop GitHub Comments

5reactions
hohoCodecommented, Aug 23, 2018

Exactly the same question, is there any official way to share vocabulary ACROSS fields? Thanks!

3reactions
yyHakercommented, Dec 24, 2018

I think this maybe is a method! image

Read more comments on GitHub >

github_iconTop Results From Across the Web

Content Area Vocabulary Learning | Reading Rockets
Modeling word solving should occur across content areas. This requires that teachers select pieces of text that include complex vocabulary terms and that...
Read more >
Making Vocabulary Instruction Active With Language Field ...
The benefit of video sharing is that students can learn about one another's words across class periods. Ideas for fitting language field guides ......
Read more >
4 Activities to Boost Target Language Vocabulary Acquisition
1. Source retelling: I summarize out loud ideas from a source text we've covered in class, leaving out keywords that students, working in...
Read more >
3 Strategies for Teaching Academic Vocabulary - MiddleWeb
No matter which categories of academic vocabulary you use, it's important to consider how to teach vocabulary. We'll look at three strategies ...
Read more >
Integrated Vocabulary Instruction - ERIC
Teach specific vocabulary through explicit instruction and use of new words. • Teach independent strategies that students can use to unlock the meanings...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found