Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

einsum fails to optimize, leads to very expensive evaluation

See original GitHub issue

einsum fails to optimize. For example for

import numpy as np
n1=6
n2=4
a=np.random.rand(n1,n1,n2)
b=np.random.rand(n2,n2,n2)
print(np.einsum_path('gfl,egh,efj,mjk,cdn,bck,bdi,oih',a,a,a,b,a,a,a,b,optimize=True)[1])

output:

  Complete contraction:  gfl,egh,efj,mjk,cdn,bck,bdi,oih->lmno
         Naive scaling:  14
     Optimized scaling:  14
      Naive FLOP count:  2.446e+10
  Optimized FLOP count:  2.446e+10
   Theoretical speedup:  1.000
  Largest intermediate:  2.560e+02 elements
--------------------------------------------------------------------------
scaling                  current                                remaining
--------------------------------------------------------------------------
  14    oih,bdi,bck,cdn,mjk,efj,egh,gfl->lmno                               lmno->lmno

It did not attempt to do any optimization (even with the flag optimize=True, ‘greedy’ or ‘optimal’) and gives an evaluation that scales as n^14, where n is the dimension of a tensor leg. It should be trivial to optimize the above problem, for instance, by the following path (calculated by hand)

scaling------ current-----------------------remaining 5 ------ gfl,efj -> egjl ---------------------- oih,bdi,bck,cdn,mjk,egh,egjl->lmno 5 ------ egjl,egh -> hjl --------------------- oih,bdi,bck,cdn,mjk,hjl->lmno 5 ------ cdn,bck -> bdkn------------------- oih,bdi,bdkn,mjk,hjl->lmno 5 ------ bdkn,bdi -> ikn---------------------oih,ikn,mjk,hjl->lmno 5 ------ mjk,hjl -> hklm--------------------- oih,ikn,hklm->lmno 5 ------ oih,ikn -> hkno --------------------- hkno,hklm->lmno 6 ------ hkno,hklm->lmno ------------------- lmno->lmno

that scales as n^6. Even doing any single contraction will reduce the scaling cost down from n^14.

Also when one set n1=4, n2=50, and ran the rest of the code, it is being correctly optimized and ran fast. In this case einsum_path with optimize=True gives optimization which scales as n^8

n1=4
n2=50
a=np.random.rand(n1,n1,n2)
b=np.random.rand(n2,n2,n2)
print(np.einsum_path('gfl,egh,efj,mjk,cdn,bck,bdi,oih',a,a,a,b,a,a,a,b,optimize=True)[1])

output:

  Complete contraction:  gfl,egh,efj,mjk,cdn,bck,bdi,oih->lmno
         Naive scaling:  14
     Optimized scaling:  8
      Naive FLOP count:  1.280e+18
  Optimized FLOP count:  2.950e+08
   Theoretical speedup:  4338394779.222
  Largest intermediate:  6.250e+06 elements
--------------------------------------------------------------------------
scaling                  current                                remaining
--------------------------------------------------------------------------
   5               oih,egh->egio       gfl,efj,mjk,cdn,bck,bdi,egio->lmno
   5               mjk,efj->efkm          gfl,cdn,bck,bdi,egio,efkm->lmno
   6             egio,bdi->bdego             gfl,cdn,bck,efkm,bdego->lmno
   6             efkm,bck->bcefm                gfl,cdn,bdego,bcefm->lmno
   8         bcefm,bdego->cdfgmo                     gfl,cdn,cdfgmo->lmno
   7           cdfgmo,gfl->cdlmo                          cdn,cdlmo->lmno
   6             cdlmo,cdn->lmno                               lmno->lmno

For some reason for the first case n1=6, n2=4, the optimization algorithm for einsum did not ran at all.

Numpy/Python version information:

1.15.0 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)]

Issue Analytics

State:
Created 5 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

sebergcommented, Aug 28, 2018

Ah, well, ok then it should have been faster (with optimization) maybe. Note that:

import numpy as np
n1=6
n2=4
a=np.random.rand(n1,n1,n2)
b=np.random.rand(n2,n2,n2)
print(np.einsum_path('gfl,egh,efj,mjk,cdn,bck,bdi,oih',a,a,a,b,a,a,a,b, optimize=('greedy', 2**10))[1])

does optimze it right, as per documention, the intermediates are otherwise limited in size. Can also set ‘optimal’, but that seems to take a long time. Now that limitation is probably a bit small, so maybe @dgasmith has an opinion about it.

0reactions

sebergcommented, Aug 28, 2018

Well, we can probably increase the cap a bit in any case, but maybe it is really mostly a documentation thing. Although I am not sure I like the tuple notation if the memory turns out to be such an important argument to be passed.

Your idea about a warning of large mem usage with a very large cap, has its merit, although I am not sure like it if it was the default for einsum itself at some point.

Top Results From Across the Web

Einsum optimal substantially slower than greedy #14332

The slowdown in the optimal einsum seems to come from the third contraction, but I am unsure why exactly it is so costly....

python - Translating np.einsum to something more performant

This operation is extremely costly; I want to make this more performant. For one thing, einsum is not parallelized. For another, beecause Y...

numpy.einsum — NumPy v1.24 Manual

Evaluates the Einstein summation convention on the operands. Using the Einstein summation convention, many common multi-dimensional, linear algebraic array ...

An `einsum` use-case - Medium

All told, our entire optimization procedure for learning the GMM required about 1 second of computation for each x. That gets really expensive, ......

Einops: Clear and Reliable Tensor Manipulations with ...

A comparison with einsum is missing, which confuses the reader and does not highlight some features that are present in einsum but not...