Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

confused about tgt_seq and gold, please give me some help

See original GitHub issue

First, thanks for your share code. I got little confused about tgt_seq in Transformer.forward method(Models.py) tgt_seq, tgt_pos = tgt_seq[:, :-1], tgt_pos[:, :-1] can you tell me why did you discard the last char in sequence y ? Is it related to the below code ? gold = tgt_seq[:, 1:]

Any help will be appreciated !

Issue Analytics

State:
Created 5 years ago
Comments:11

Top GitHub Comments

3reactions

PkuDavidGuancommented, Mar 4, 2019

Because only the first len(tgt_seq) - 1 words are significant. The decoder takes [BOS, word1_gt, word2_gt, …] as input and output [word1_pred, word2_pred, …]. In other words, the maximum length of the decoder’s output should be len(tgt_seq) - 1. Besides, I made an experiment about two tgt_seq encodings. In the first model, the tgt_seq is like [BOS, word1, …, EOS, PAD,…,PAD]; In the second model, the tgt_seq is like [BOS, word1, word2,…, PAD, …PAD,EOS]. The second model is worse than the first one.

0reactions

valencebondcommented, Apr 6, 2019

As decoder input, we won’t decode at the <end> position, since we’ve finished generating as soon as we generate <end>, So, decoding lengths are actual lengths - 1, so tgt_seq, tgt_pos = tgt_seq[:, :-1], tgt_pos[:, :-1]. The last word of decoder input should be <pad>, not <end>.