pvlib.iam.marion_integrate uses too much memory for vector inputs
See original GitHub issuepvlib.iam.marion_integrate
(which is mostly relevant as a helper for pvlib.iam.marion_diffuse
) needs quite a bit of memory when passed vector inputs. An input of length 1000 allocates around 2GB of memory on my machine, so naively passing in a standard 8760 would use roughly 17-18 GB. Unfortunately I was very much focused on fixed tilt simulations when I wrote pvlib’s implementation and never tried it out on large vector inputs, so this problem went unnoticed until @spaneja pointed it out to me.
I think any vectorized implementation of this algorithm is going to be rather memory-heavy, so I’m skeptical that achieving even a factor of 10 reduction in memory usage is possible here without completely changing the approach (and likely shifting the burden from memory to CPU). However, here are two low-hanging fruits worth considering:
- The current implementation has a handful of large 2-D arrays local to the function that only get released when the function returns. Some of them are only used near the beginning of the function but still take up memory for the entire function duration. Using the
del
statement to instruct python that those arrays are no longer needed allows python to reclaim that memory immediately and recycle it for subsequent allocations. This is probably a simplification of what actually happens, but it seems consistent with the below observations. np.float32
cuts memory usage in half compared withnp.float64
and (probably) doesn’t meaningfully change the result. It’s not likesurface_tilt
has more than a few sig figs anyway.
Here is a rough memory and timing comparison (using memory_profiler, very handy). pvlib
is the current implementation; the two del
variants use a strategic sprinkling of del
but are otherwise not much different from pvlib
. This is for an input of length 1000. The traces here are memory usage sampled at short intervals across a single function invocation; for example the blue pvlib
trace shows that the function call took 1.4 seconds to complete and had a peak memory usage slightly higher than 2GB.
So using a few del
s cuts peak memory usage roughly in half. Dropping down to np.float32
cuts it roughly in half again (and gives a nontrivial speedup too). It’s possible that further improvements can be had with other tricks (e.g. using the out
parameter that some numpy functions provide) but I’ve not yet explored them.
My main question: are we open to using these two strategies in pvlib? Despite being built into python itself, del
still seems unpythonic to me for some reason. Switching away from float64
is objectionable to the extent that it’s the standard in scientific computing and is therefore baked into the models by assumption. I think I’m cautiously open to both of the above approaches, iff they are accompanied by good explanatory comments and switching to float32
can be reasonably shown to not introduce a meaningful difference in output.
Remark: even ignoring this memory bloat, I tend to think that applying marion_integrate
directly to an 8760 is a bit strange. In simulations with time series surface_tilt
s, a better approach IMHO is to calculate the IAM values only for np.linspace(0, 90, 1)
or similar and use pvlib.iam.interp
to generate the 8760 IAM series. If nothing else, we might suggest that in the docs.
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (10 by maintainers)
I don’t oppose anything above. Other options:
num
Not opposed to del but would the numpy out kwarg help?