question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

An example where np.einsum is slower than manual matmul/transpositions. (#1966 works equally fast for me, but this example is consistently slower) on CPU and GPU.

https://colab.research.google.com/gist/romanngg/e63834765d00497e315455867a52eae1/einsum_is_slow.ipynb

import jax.numpy as np
import jax.random as random
from jax.api import jit

a = random.normal(random.PRNGKey(1), (100, 20, 20, 3))
b = random.normal(random.PRNGKey(2), (200, 20, 20, 3))

@jit
def matmul(a, b):
  return np.transpose(np.matmul(np.transpose(a, axes=(1, 2, 0, 3)), np.transpose(b, axes=(1, 2, 3, 0))), axes=(2, 3, 0, 1))

@jit
def einsum(a, b):
  return np.einsum('nxyc,mxyc->nmxy', a, b, optimize=True)

np.sum(np.abs(einsum(a, b) - matmul(a, b)))

%timeit einsum(a, b).block_until_ready()

%timeit matmul(a, b).block_until_ready()

Also note that if you run it on CPU, the difference between the method outputs becomes non-zero DeviceArray(0.01003271, dtype=float32) - not sure how concerning it is.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:2
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

5reactions
randolf-scholzcommented, Feb 14, 2022

Ran some benchmarks on several libraries, time in seconds on a Ryzen 3900X

size 64 128 256
dtype float32 float32 float32
lib numpy torch jax numpy torch jax numpy torch jax
reduction
ij,ijkl->kl 0.00 0.01 0.02 0.05 0.05 0.06 0.75 0.76 0.70
ji,ijkl->kl 0.00 0.00 0.02 0.05 0.05 0.06 0.75 0.76 0.70
jk,ijkl->il 0.00 0.01 0.03 0.05 0.20 0.26 0.77 2.75 2.78
kj,ijkl->il 0.01 0.01 0.04 0.16 0.20 0.22 2.33 2.80 2.87
ik,ijkl->jl 0.01 0.02 0.03 0.15 0.19 0.22 2.35 2.84 2.94
ki,ijkl->jl 0.01 0.01 0.04 0.15 0.20 0.22 2.34 2.78 3.07
li,ijkl->jk 0.01 0.01 0.04 0.29 0.23 0.26 4.61 10.14 9.66
il,ijkl->jk 0.00 0.02 0.04 0.06 0.22 0.28 0.82 9.98 9.86
lj,ijkl->ik 0.01 0.01 0.04 0.30 0.23 0.27 4.66 9.62 10.02
jl,ijkl->ik 0.00 0.02 0.04 0.06 0.22 0.25 0.90 10.07 10.08
lk,ijkl->ij 0.01 0.00 0.07 0.28 0.04 0.82 4.27 0.76 59.20
kl,ijkl->ij 0.00 0.00 0.06 0.06 0.05 0.83 0.90 0.76 62.52
np.__version__='1.21.5'
torch.__version__='1.10.2'
jax.__version__='0.2.21'
jaxlib.__version__='0.1.76'
Read more comments on GitHub >

github_iconTop Results From Across the Web

Why is numpy's einsum slower than numpy's built-in functions?
Can someone explain why einsum is so much slower here? If it matters, here is my numpy config: In [6]: np.show_config() lapack_info: libraries...
Read more >
Why PyTorch einsum is significantly slower than transpose
I have been tinkering with some DL models and wanted to implement part of it using PyTorch einsum. Before doing so I was...
Read more >
Tensor contractions with numpy's einsum function seem slow ...
Hi everyone, It seems the manual way of doing tensor contractions using matrix multiplication is much faster than using numpy's einsum function.
Read more >
einsum running 100x slower than expected when controlling ...
For two simple einsum cases, Case 2 is running ~30% slower than Case 1, despite needing to perform 101x fewer ops.
Read more >
Write Better And Faster Python Using Einstein Notation
... hard to read, and even slow. This was the case for me until I discovered NumPy's einsum function a while ago and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found