Question on enc_input_mask and ignore_index / pad_value for EncDec.
See original GitHub issueSorry if this has already been answered, I am a little confuse on how to tell the Enc-Dec model to ignore the padding value, both for training and generating. Is ignore_index/pad_value sufficient enough or an additional enc_input_mask need to be feeded in too that label index of padding value False?
My implementation is similar to the following code
pad_tok = ... # The tokens for padding value
INPUT_DIM = 20000
OUTPUT_DIM = 20000
max_length = 256
model = ReformerEncDec(
enc_num_tokens = INPUT_DIM
enc_max_seq_len = max_length,
dec_num_tokens = OUTPUT_DIM,
dec_max_seq_len = max_length,
ignore_index = pad_tok,
pad_value = pad_tok
).to(device)
Feedforwaring and training the model
optimizer = ...
# src = (32, max_length)
# trg = (32, max_length)
...
# training
optimizer.zero_grad()
loss = model(src, trg, return_loss=True)
loss.backward()
optimizer.step()
And when to generate for a batch, I simply call the following
seq_out = torch.zeros((src.shape[0], 1)).long().to(device)
sample = model.generate(src, seq_out,
seq_len = max_length,
input_mask=None)
Also, is it normal to take roughly half a minute to generate a batch size of 32, and 1 minute for size of 64? I have been using Kaggle Kernel (Tesla P100 16gb VRAM) for testing for the following parameters and same generate code block as above.
model = ReformerEncDec(
dim = 64,
enc_num_tokens = 20000,
enc_depth = 2,
enc_max_seq_len = 256,
enc_heads = 4,
dec_num_tokens = 20000,
dec_depth = 2,
dec_max_seq_len = 256,
dec_heads = 4,
ignore_index = pad_tok,
pad_value = pad_tok
)
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
pandas concat ignore_index doesn't work - Stack Overflow
I think ignore_index only ignores the labels on the axis you're joining on, so it still does an outer join on the index...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@nakarinh14 yes, it should as well
@nakarinh14 yes, that should work!