einsum fails to optimize, leads to very expensive evaluation
See original GitHub issueeinsum fails to optimize. For example for
import numpy as np
n1=6
n2=4
a=np.random.rand(n1,n1,n2)
b=np.random.rand(n2,n2,n2)
print(np.einsum_path('gfl,egh,efj,mjk,cdn,bck,bdi,oih',a,a,a,b,a,a,a,b,optimize=True)[1])
output:
Complete contraction: gfl,egh,efj,mjk,cdn,bck,bdi,oih->lmno
Naive scaling: 14
Optimized scaling: 14
Naive FLOP count: 2.446e+10
Optimized FLOP count: 2.446e+10
Theoretical speedup: 1.000
Largest intermediate: 2.560e+02 elements
--------------------------------------------------------------------------
scaling current remaining
--------------------------------------------------------------------------
14 oih,bdi,bck,cdn,mjk,efj,egh,gfl->lmno lmno->lmno
It did not attempt to do any optimization (even with the flag optimize=True, ‘greedy’ or ‘optimal’) and gives an evaluation that scales as n^14, where n is the dimension of a tensor leg. It should be trivial to optimize the above problem, for instance, by the following path (calculated by hand)
scaling------ current-----------------------remaining 5 ------ gfl,efj -> egjl ---------------------- oih,bdi,bck,cdn,mjk,egh,egjl->lmno 5 ------ egjl,egh -> hjl --------------------- oih,bdi,bck,cdn,mjk,hjl->lmno 5 ------ cdn,bck -> bdkn------------------- oih,bdi,bdkn,mjk,hjl->lmno 5 ------ bdkn,bdi -> ikn---------------------oih,ikn,mjk,hjl->lmno 5 ------ mjk,hjl -> hklm--------------------- oih,ikn,hklm->lmno 5 ------ oih,ikn -> hkno --------------------- hkno,hklm->lmno 6 ------ hkno,hklm->lmno ------------------- lmno->lmno
that scales as n^6. Even doing any single contraction will reduce the scaling cost down from n^14.
Also when one set n1=4, n2=50, and ran the rest of the code, it is being correctly optimized and ran fast. In this case einsum_path with optimize=True gives optimization which scales as n^8
n1=4
n2=50
a=np.random.rand(n1,n1,n2)
b=np.random.rand(n2,n2,n2)
print(np.einsum_path('gfl,egh,efj,mjk,cdn,bck,bdi,oih',a,a,a,b,a,a,a,b,optimize=True)[1])
output:
Complete contraction: gfl,egh,efj,mjk,cdn,bck,bdi,oih->lmno
Naive scaling: 14
Optimized scaling: 8
Naive FLOP count: 1.280e+18
Optimized FLOP count: 2.950e+08
Theoretical speedup: 4338394779.222
Largest intermediate: 6.250e+06 elements
--------------------------------------------------------------------------
scaling current remaining
--------------------------------------------------------------------------
5 oih,egh->egio gfl,efj,mjk,cdn,bck,bdi,egio->lmno
5 mjk,efj->efkm gfl,cdn,bck,bdi,egio,efkm->lmno
6 egio,bdi->bdego gfl,cdn,bck,efkm,bdego->lmno
6 efkm,bck->bcefm gfl,cdn,bdego,bcefm->lmno
8 bcefm,bdego->cdfgmo gfl,cdn,cdfgmo->lmno
7 cdfgmo,gfl->cdlmo cdn,cdlmo->lmno
6 cdlmo,cdn->lmno lmno->lmno
For some reason for the first case n1=6, n2=4, the optimization algorithm for einsum did not ran at all.
Numpy/Python version information:
1.15.0 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)]
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (6 by maintainers)
Ah, well, ok then it should have been faster (with optimization) maybe. Note that:
does optimze it right, as per documention, the intermediates are otherwise limited in size. Can also set ‘optimal’, but that seems to take a long time. Now that limitation is probably a bit small, so maybe @dgasmith has an opinion about it.
Well, we can probably increase the cap a bit in any case, but maybe it is really mostly a documentation thing. Although I am not sure I like the tuple notation if the memory turns out to be such an important argument to be passed.
Your idea about a warning of large mem usage with a very large cap, has its merit, although I am not sure like it if it was the default for
einsum
itself at some point.