How do I share the vocab between the source and target language for machine translation
See original GitHub issueHi I do I create combined vocabulary from the source and target fields from the multi30k dataset . I am interested in having a shared encoder which can represent source as well the target words.
SRC = Field(tokenize=tokenize_de,
init_token='<sos>',
eos_token='<eos>',
lower=True,
batch_first=True)
TRG = Field(tokenize=tokenize_en,
init_token='<sos>',
eos_token='<eos>',
lower=True,
batch_first=True)
train_data, valid_data, test_data = Multi30k.splits(exts=('.de', '.en'),
fields=(SRC, TRG))
SRC.build_vocab(train_data, min_freq=2)
TRG.build_vocab(train_data, min_freq=2)
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
How to have different source and target vocabularies?
You choose vocab size for each language separately and simplest approach is just to preserve all words but usually you skip most common...
Read more >Effective Cross-lingual Transfer of Neural Machine ...
A popular solution to this is sharing the vocab- ulary among the languages of concern (Nguyen and Chiang, 2017; Kocmi and Bojar, 2018)....
Read more >Machine Translation: Everything You Need to Know - Lilt
The process of interlingual machine translation involves converting the source language into interlingua (an intermediate representation), then converting the ...
Read more >Focus on the Target's Vocabulary: Masked Label ...
Words or subwords in a language pair's joint dictionary can be categorized into three classes: source, common and target using Venn Diagram ......
Read more >Improving Zero-shot Neural Machine Translation on ...
all languages share the same vocabulary and weights, the ... translating from a source language i to a target language j.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@thak123
This will cause your
SRC
andTRG
to share a single vocab. You can then use one embedding layer for both languages instead of one per language.omg… @bentrevett thanks for the code.