Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Einsum is slow

See original GitHub issue

An example where np.einsum is slower than manual matmul/transpositions. (#1966 works equally fast for me, but this example is consistently slower) on CPU and GPU.

https://colab.research.google.com/gist/romanngg/e63834765d00497e315455867a52eae1/einsum_is_slow.ipynb

import jax.numpy as np
import jax.random as random
from jax.api import jit

a = random.normal(random.PRNGKey(1), (100, 20, 20, 3))
b = random.normal(random.PRNGKey(2), (200, 20, 20, 3))

@jit
def matmul(a, b):
  return np.transpose(np.matmul(np.transpose(a, axes=(1, 2, 0, 3)), np.transpose(b, axes=(1, 2, 3, 0))), axes=(2, 3, 0, 1))

@jit
def einsum(a, b):
  return np.einsum('nxyc,mxyc->nmxy', a, b, optimize=True)

np.sum(np.abs(einsum(a, b) - matmul(a, b)))

%timeit einsum(a, b).block_until_ready()

%timeit matmul(a, b).block_until_ready()

Also note that if you run it on CPU, the difference between the method outputs becomes non-zero DeviceArray(0.01003271, dtype=float32) - not sure how concerning it is.

Issue Analytics

State:
Created 4 years ago
Reactions:2
Comments:9 (6 by maintainers)

Top GitHub Comments

6reactions

romannggcommented, Jun 25, 2020

FYI, I have revisited the example below on:

CPU: einsum is slow AND wrong: https://colab.research.google.com/gist/romanngg/48fb8d4d3a3fb5da9be84d8d1fb862ad/einsum_is_wrong_and_slow_cpu.ipynb
GPU: einsum is slow: https://colab.research.google.com/gist/romanngg/dd1e2adbda90749f140012f1b9342353/einsum_is_slow_gpu.ipynb
TPU: einsum is OK! https://colab.research.google.com/gist/romanngg/635b467426bd9ead276cc6f9216ed03d/einsum_is_ok_tpu.ipynb

Will file bugs agains XLA:CPU and XLA:GPU!

5reactions

randolf-scholzcommented, Feb 14, 2022

Ran some benchmarks on several libraries, time in seconds on a Ryzen 3900X

size	64			128			256
dtype	float32			float32			float32
lib	numpy	torch	jax	numpy	torch	jax	numpy	torch	jax
reduction
ij,ijkl->kl	0.00	0.01	0.02	0.05	0.05	0.06	0.75	0.76	0.70
ji,ijkl->kl	0.00	0.00	0.02	0.05	0.05	0.06	0.75	0.76	0.70
jk,ijkl->il	0.00	0.01	0.03	0.05	0.20	0.26	0.77	2.75	2.78
kj,ijkl->il	0.01	0.01	0.04	0.16	0.20	0.22	2.33	2.80	2.87
ik,ijkl->jl	0.01	0.02	0.03	0.15	0.19	0.22	2.35	2.84	2.94
ki,ijkl->jl	0.01	0.01	0.04	0.15	0.20	0.22	2.34	2.78	3.07
li,ijkl->jk	0.01	0.01	0.04	0.29	0.23	0.26	4.61	10.14	9.66
il,ijkl->jk	0.00	0.02	0.04	0.06	0.22	0.28	0.82	9.98	9.86
lj,ijkl->ik	0.01	0.01	0.04	0.30	0.23	0.27	4.66	9.62	10.02
jl,ijkl->ik	0.00	0.02	0.04	0.06	0.22	0.25	0.90	10.07	10.08
lk,ijkl->ij	0.01	0.00	0.07	0.28	0.04	0.82	4.27	0.76	59.20
kl,ijkl->ij	0.00	0.00	0.06	0.06	0.05	0.83	0.90	0.76	62.52

np.__version__='1.21.5'
torch.__version__='1.10.2'
jax.__version__='0.2.21'
jaxlib.__version__='0.1.76'

Top Results From Across the Web

Why is numpy's einsum slower than numpy's built-in functions?

Can someone explain why einsum is so much slower here? If it matters, here is my numpy config: In [6]: np.show_config() lapack_info: libraries...

Why PyTorch einsum is significantly slower than transpose

I have been tinkering with some DL models and wanted to implement part of it using PyTorch einsum. Before doing so I was...

Tensor contractions with numpy's einsum function seem slow ...

Hi everyone, It seems the manual way of doing tensor contractions using matrix multiplication is much faster than using numpy's einsum function.

einsum running 100x slower than expected when controlling ...

For two simple einsum cases, Case 2 is running ~30% slower than Case 1, despite needing to perform 101x fewer ops.

Write Better And Faster Python Using Einstein Notation

... hard to read, and even slow. This was the case for me until I discovered NumPy's einsum function a while ago and...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Einsum is slow

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Matrix multiplication precision API

scan with gradient checkpointing