irradiance.clearsky_index() isn't tolerant of pandas DataFrame inputs
See original GitHub issueDescribe the bug I’m getting some issues with irradiance.clearsky_index() with mixed datatypes on inputs. Specifically, clearsky_ghi is often a pandas Series. If working with a ghi input that shares an index, but is a DataFrame, division behaves strangely due how pandas interprets the indices. I understand that this is kind of a peculiarity of pandas rather than something wrong with pvlib, so it may be something that you don’t want to fix. But I just wanted to bring it up.
To Reproduce Steps to reproduce the behavior: This code produces an unexpected output. Though this is an example of a 1 column DataFrame, which is a trivial case, this same thing happens given a DataFrame with multiple columns as well.
index = pd.date_range('2022-10-10 10:00:00', '2022-10-10 11:00:00', freq='1h')
cs_ghi = pd.Series([1,1], index=index)
ghi = pd.DataFrame([0.5,0.5], columns=['A'], index=index)
pvlib.irradiance.clearsky_index(ghi, cs_ghi)
# returns array([[0., 0., 0.], [0., 0., 0.]])
Expected behavior Since they share an index, ideally ghi would still be converted to clearsky_index. In principle, this ought to work even with multiple columns in the DataFrame (the actual use case I have). So the output that you’d like to see here is:
A
2022-10-10 10:00:00 0.5
2022-10-10 11:00:00 0.5
Versions:
pvlib.__version__
: 0.9.0pandas.__version__
: 1.3.2- python: 3.9.7
Additional context There’s a current workaround, which is to apply the function column by column:
ghi.apply(lambda x: pvlib.irradiance.clearsky_index(x, cs_ghi), axis=0)
There is a way to resolve it internally and have the division not broadcast if you know that ghi is a DataFrame
clearsky_index = ghi.div(cs_ghi, axis=0)
But there are then some issues with broadcasting for the logical tests later on, and possibly formatting the output.
So again, this boils down to whether enough people use DataFrames as inputs that it’s worth going through that typing business within the pvlib function. I’m happy to look at how to make it be a little more type agnostic if you want to resolve it that way. If not, it might be worth documenting somehow that both ghi and cs_ghi must be 1-D for this to work.
Issue Analytics
- State:
- Created a year ago
- Comments:9 (9 by maintainers)
Top GitHub Comments
Ok got a followup. I actually made this work for 3D arrays as well as all the tests you proposed. The trick is to replace my previous proposal with:
Then here’s a test on a 3D array, where the time is the “page” dimension, which is actually the first axis.
I put this in a branch, and would be happy to make it into a full pull request if you like this solution.
I think we’re leaning towards the other direction: avoid anything to do with types in functions like this. See #1455. I’m not aware of any broadcasting tricks that are compatible with both numpy and pandas that would solve this problem but maybe someone else is.
The code below solves the problem if you’re willing to call
.to_frame()
on the input Series to add another dimension, then.to_numpy()
to strip the index.You’d still be stuck with array output though. In many pvlib functions we use
np.where
because it can handle scalars (unlike boolean indexing), but it always returns a numpy array. That means that we instead need special code to handleSeries
(or anything else likeDataFrame
). Might be better to cast a scalar to an array, use boolean indexing in the function logic, then reduce the result to a scalar. Then the DataFrame in will result in a DataFrame out. Let’s see…The special case for scalars is gross but probably worth it.