Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question on enc_input_mask and ignore_index / pad_value for EncDec.

See original GitHub issue

Sorry if this has already been answered, I am a little confuse on how to tell the Enc-Dec model to ignore the padding value, both for training and generating. Is ignore_index/pad_value sufficient enough or an additional enc_input_mask need to be feeded in too that label index of padding value False?

My implementation is similar to the following code

pad_tok = ... # The tokens for padding value
INPUT_DIM = 20000
OUTPUT_DIM = 20000
max_length = 256

model = ReformerEncDec(
    enc_num_tokens =  INPUT_DIM
    enc_max_seq_len = max_length,
    dec_num_tokens = OUTPUT_DIM,
    dec_max_seq_len = max_length,
    ignore_index = pad_tok,
    pad_value = pad_tok
).to(device)

Feedforwaring and training the model

optimizer = ...
# src =  (32, max_length)
# trg = (32, max_length)
...
# training
optimizer.zero_grad()
loss = model(src, trg, return_loss=True)
loss.backward()
optimizer.step()

And when to generate for a batch, I simply call the following

seq_out = torch.zeros((src.shape[0], 1)).long().to(device) 
sample = model.generate(src, seq_out, 
    seq_len = max_length, 
    input_mask=None)

Also, is it normal to take roughly half a minute to generate a batch size of 32, and 1 minute for size of 64? I have been using Kaggle Kernel (Tesla P100 16gb VRAM) for testing for the following parameters and same generate code block as above.

model = ReformerEncDec(
    dim = 64,
    enc_num_tokens = 20000,
    enc_depth = 2,
    enc_max_seq_len = 256,
    enc_heads = 4,
    dec_num_tokens = 20000,
    dec_depth = 2,
    dec_max_seq_len = 256,
    dec_heads = 4,
    ignore_index = pad_tok,
    pad_value = pad_tok
)