Why 8 attention heads rather than 4 for BaseUnet64?
See original GitHub issueHi! The text => image UNet in the Imagen paper follows the UNet architecture defined in Improved Denoising Diffusion Probabilistic Models. In that paper, they use 4 attention heads:
In the BaseUnet64
, the # of attention heads is set to 8:
https://github.com/lucidrains/imagen-pytorch/blob/2535012168d8839130af9c2b61ae17d6df3a7064/imagen_pytorch/imagen_pytorch.py#L1712
Is this b/c more attention heads = generally better?
The entire thing is a bit confusing, b/c the Imagen paper doesn’t specify the number of heads. Rather, it specifies the number of channels per head. This indicates to me that layers near the bottom of the UNet would have more attention heads since they would have a greater amount of channels, which is a deviation from OpenAI’s UNet architecture that the Imagen paper claims to follow. Not sure what’s going on there.
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:5 (3 by maintainers)
Top GitHub Comments
Ahhh, that’s the thing I was missing. They are projecting from the model dimensions to 256. Got it got it! Thanks for clarifying.
no it is still pretty similar
they hold the number of attention heads constant at 4, with dimension of 64, so you would project from model dimensions to 256