Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TransformerLayer input_mask format

See original GitHub issue

I am trying to use the DeepSpeedTransformerLayer and wondering what format the attention mask should be for left to right language model training. From https://github.com/microsoft/DeepSpeed/blob/44bd538b110ce0e8fc69626854631c3aee0dc094/tests/unit/test_cuda_forward.py#L181 , it seems like (bs, 1, seq_len, seq_len) could be correct,

but input_size: torch.Size([1, 501, 512]) and input_mask.shape=[1, 501, 501] raises

            input_mask = torch.cat((input_mask, torch.ones((inp_size[0], input_mask.shape[1], input_mask.shape[2], \>                                           (16 - (inp_size[1] % 16))), device=input_mask.device, dtype=input_mask.dtype) * -10000), 3)
E           IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3)

There is no docstring so I figured I’d ask. Thanks!

Issue Analytics

State:
Created 3 years ago
Comments:19 (10 by maintainers)

Top GitHub Comments

2reactions

RezaYazdaniAminabadicommented, Apr 28, 2021

Hi @hwidong-na

Yes, you are right, the unit test currently does not check this type of masking. I will soon add a case for that. Thanks, Reza

2reactions

sshleifercommented, Apr 15, 2021

I got it working with (1,1,seq_len,seq_len)! It seems to be faster and less memory in early benchmarks. I am very grateful 😃.

This is obviously out of scope of this issue, but I was wondering whether it’s possible to skip the FFN layers at the end of the transformer block and/or set them to identity and use my own custom FFN layers after your very fast attention?