Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to share vocabulary across fields?

See original GitHub issue

I’m new to torchtext. I want to use two fields which should have the same vocabulary. The only difference is that field2 prepends an <sos> token to every sequence. I have the following code:

field1 = ReversibleField(torch.LongTensor, tokenize=tokenizer)
field2 = ReversibleField(torch.LongTensor, tokenize=tokenizer, init_token='<sos>')
dataset = TabularDataset(path='train.json', format='json',
                         fields={'x': ('x', field1),  'y': ('y', field2)})
field1.build_vocab(dataset, max_size=30000)

Now I want field2 to use the vocab of field1. I tried field2.vocab = field1.vocab, but this results in an error in later processing. According to documentation, the only way to force the same vocabulary seems to be to use the same field, but setting the init_token dynamically isn’t possible when the dataset is read by a BucketIterator.

My current workaround is to first save field1, and then load it also as field2:

field1 = ReversibleField(torch.LongTensor, tokenize=tokenizer, init_token='<sos>')
# ... build vocab as before
torch.save(field1, 'field1.pt')
field2 = torch.load('field1.pt')
field1.init_token = None

However, I don’t know if this will also work if the fields are associated with word vectors. Since the vocab of field1 and field2 are essentially two independent copies, updates to the word embeddings will have to be performed on both fields. Alternatively, I can manually create a vocabulary dict and use it to initialize two copies of Vocab, but how can I make the two fields to use them?

Is there a recommended way to share vocabulary?

Issue Analytics

State:
Created 5 years ago
Comments:11 (3 by maintainers)

Top GitHub Comments

5reactions

hohoCodecommented, Aug 23, 2018

Exactly the same question, is there any official way to share vocabulary ACROSS fields? Thanks!

3reactions

yyHakercommented, Dec 24, 2018

I think this maybe is a method!

Top Results From Across the Web

Content Area Vocabulary Learning | Reading Rockets

Modeling word solving should occur across content areas. This requires that teachers select pieces of text that include complex vocabulary terms and that...

Making Vocabulary Instruction Active With Language Field ...

The benefit of video sharing is that students can learn about one another's words across class periods. Ideas for fitting language field guides ......

4 Activities to Boost Target Language Vocabulary Acquisition

1. Source retelling: I summarize out loud ideas from a source text we've covered in class, leaving out keywords that students, working in...

3 Strategies for Teaching Academic Vocabulary - MiddleWeb

No matter which categories of academic vocabulary you use, it's important to consider how to teach vocabulary. We'll look at three strategies ...

Integrated Vocabulary Instruction - ERIC

Teach specific vocabulary through explicit instruction and use of new words. • Teach independent strategies that students can use to unlock the meanings...