Low Efficiency caused by matrix-vector product in for loop
See original GitHub issueHi all,
When I am running moderately big (?) problem 670 sources, and 50,000 cells (3D-IP inversion). It takes about 8 hrs to run 15 GN iterations, which is not too bad. I need to give bit of context on IP inversion. It is linear inversion, but requries sensitivity of DC problem: d = Gm For 3D, rather than generating sensitivity matrix, I store factor of system matrix, and use that to solve linear IP problem. Hence, this shoud be fast …, but it was not than I expected. So I did few experiments, and found an issue.
Below illustrate the issue:
In the first cell above, I factorize A and solve to compute predicted data (IP), so it requires both A inverse, and back-substitutie. But in the second cell, it is already factorized, so we do not need A inverse, but just back-substitution is required. However, it still takes 19 sec, which is not that different from 22 sec (including factorization). So, I was curious that back subsitution takes that much time… In the outside, I evaluated how long that it takes:
It only took 0.52 second. But evaluation of problem.getAderiv
did take 16 second, which is major portion of total time. So I did break a part problem.getAderiv
as
Then figured problem.MeSigmaDeriv
takes most of our time.
This does require evaluation of mesh3D.getEdgeInnerProductDeriv
, which could be broken apart:
Basically, the above shows matrix-vector in a for loop is the monster. Does anyone have good idea what is happening here, and also a good fix of this issue? @rowanc1 @grosenkj @jcapriot @fourndo @lheagy @bsmithyman
I believe most of SimPEG code can suffer from this, and solving this issue could hugely increase efficiency of a number of SimPEG codes!
Issue Analytics
- State:
- Created 6 years ago
- Comments:12 (11 by maintainers)
Top GitHub Comments
Hi, the copy thing is only for the example in StackOverflow. It has nothing to do with SimPEG being slow.
Point was that slicing multidimensional array is slower than using whole array (contiguous layout). (see code and fig below)
The problem here is probably the sparse
hstack
. It is called everytime whenmesh.aveE2CC
is called. I understand that this value can not be saved to variable, but maybe memoization could help?In python 3 there is
lru_cache
in functools. For python 2 we probably need to create a custom memoization function.Other options are:
The example array
A
(see code below) has size 1Mx1M. I am not sure what is the size of your 3D mesh, but if you runhstack
600 times it will take some time even if its smaller than the example here. It would be good idea to see what is the case with your mesh.taken from StackOverflow:
That is true @lheagy. I’ll put together, and make an issue there!