Why does ALBERT use einsum in PyTorch implementation while in TF one it does not?
See original GitHub issue❓ Questions & Help
I wanted to learn internals of ALBERT model from your implementation (which is BTW really clean in comparison to the original one - good job!), but I’ve stumbled upon weird looking part in the AlbertAttention
: https://github.com/huggingface/transformers/blob/6af3306a1da0322f58861b1fbb62ce5223d97b8a/src/transformers/modeling_albert.py#L258
Why does PyTorch version use einsum
-based notation while calculating hidden state (with manual usage of dense
layer’s weights), while the TensorFlow version just reshapes the context_layer
and does standard “forward” on dense layer?
I would really like to know the explanation of this implementation - @LysandreJik cloud you shed some light here?
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (6 by maintainers)
Top Results From Across the Web
Understanding einsum for Deep learning: implement a ...
einsum when I operate on multiple tensors. Axis indexing rules. The difference with einops is that you can use more than single lowercase ......
Read more >torch.einsum — PyTorch 1.13 documentation
Sums the product of the elements of the input operands along dimensions specified using a notation based on the Einstein summation convention.
Read more >einsum - an underestimated function | by Chris Lemke
I will use Pytorch's einsum function in the upcoming code, but you may use Numpy's or the one from Tensorflow — they are...
Read more >python - Understanding NumPy's einsum - Stack Overflow
The great thing about einsum however, is that it does not build a temporary array of products first; it just sums the products...
Read more >Einsum Is All You Need: NumPy, PyTorch and TensorFlow
In this video I explain how Einstein Summation ( einsum ) works and why it is amazing, at the end of the video...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
For no particular reason, but it might not have been the best choice according to this thread on performance.
The two implementations are equivalent, but the Pytorch version is cumbersome. I think the code
should be rewritten by