Special token index is verbose.
See original GitHub issueContext Special tokens are frequently used for masking or padding or interpreting the model. It’s important in a Encoder/Decoder context that the decoder and encoder share the same indexes for EOS, SOS, and PAD.
Problem
Creating two fields, one for French and one for English, there are no class constants for the index of eos_token
. The only way to find out the index of eos_token
is per instance of the class (etc. self.stoi[eos_token]
).
The code by default is not designed to guarantee that the French dictionary has the same EOS index as the English dictionary.
Possible Solution A With setting the optional parameter ‘eos_token’ would it be possible to set ‘eos_token_index’?
Possible Solution B Vocab or Field constant for the index of special tokens.
Issue Analytics
- State:
- Created 6 years ago
- Comments:12 (12 by maintainers)
Top GitHub Comments
i feel like it’d be a mistake to design this library to mimic opennmt’s data-handling utils / focus on the seq2seq application. I personally don’t really see the need to have it as a constant (it’s not too hard to reference
self.field.vocab
anyway). frankly i don’t think the verbosity is an issue, as long as it’s keeping it clear.Okay! Thanks for your input!