confused about tgt_seq and gold, please give me some help
See original GitHub issueFirst, thanks for your share code. I got little confused about tgt_seq in Transformer.forward method(Models.py)
tgt_seq, tgt_pos = tgt_seq[:, :-1], tgt_pos[:, :-1]
can you tell me why did you discard the last char in sequence y ? Is it related to the below code ?
gold = tgt_seq[:, 1:]
Any help will be appreciated !
Issue Analytics
- State:
- Created 5 years ago
- Comments:11
Top Results From Across the Web
NAACL HLT 2019 Computational Linguistics and Clinical ...
the provision of therapy exercises or emotional support beyond ... Tell me more about what is ... Table 10: Annotator confusion matrix.
Read more >Cross-species-specific psmaxcd3 bispecific single chain ...
The present invention relates to a bispecific single chain antibody molecule comprising a first binding domain capable of binding to an epitope of...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Because only the first len(tgt_seq) - 1 words are significant. The decoder takes [BOS, word1_gt, word2_gt, …] as input and output [word1_pred, word2_pred, …]. In other words, the maximum length of the decoder’s output should be len(tgt_seq) - 1. Besides, I made an experiment about two tgt_seq encodings. In the first model, the tgt_seq is like [BOS, word1, …, EOS, PAD,…,PAD]; In the second model, the tgt_seq is like [BOS, word1, word2,…, PAD, …PAD,EOS]. The second model is worse than the first one.
As decoder input, we won’t decode at the <end> position, since we’ve finished generating as soon as we generate <end>, So, decoding lengths are actual lengths - 1, so tgt_seq, tgt_pos = tgt_seq[:, :-1], tgt_pos[:, :-1]. The last word of decoder input should be <pad>, not <end>.