question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

irradiance.clearsky_index() isn't tolerant of pandas DataFrame inputs

See original GitHub issue

Describe the bug I’m getting some issues with irradiance.clearsky_index() with mixed datatypes on inputs. Specifically, clearsky_ghi is often a pandas Series. If working with a ghi input that shares an index, but is a DataFrame, division behaves strangely due how pandas interprets the indices. I understand that this is kind of a peculiarity of pandas rather than something wrong with pvlib, so it may be something that you don’t want to fix. But I just wanted to bring it up.

To Reproduce Steps to reproduce the behavior: This code produces an unexpected output. Though this is an example of a 1 column DataFrame, which is a trivial case, this same thing happens given a DataFrame with multiple columns as well.

index = pd.date_range('2022-10-10 10:00:00', '2022-10-10 11:00:00', freq='1h')
cs_ghi = pd.Series([1,1], index=index)
ghi = pd.DataFrame([0.5,0.5], columns=['A'], index=index)
pvlib.irradiance.clearsky_index(ghi, cs_ghi)
# returns array([[0., 0., 0.], [0., 0., 0.]])

Expected behavior Since they share an index, ideally ghi would still be converted to clearsky_index. In principle, this ought to work even with multiple columns in the DataFrame (the actual use case I have). So the output that you’d like to see here is:

                       A
2022-10-10 10:00:00  0.5
2022-10-10 11:00:00  0.5

Versions:

  • pvlib.__version__: 0.9.0
  • pandas.__version__: 1.3.2
  • python: 3.9.7

Additional context There’s a current workaround, which is to apply the function column by column:

ghi.apply(lambda x: pvlib.irradiance.clearsky_index(x, cs_ghi), axis=0)

There is a way to resolve it internally and have the division not broadcast if you know that ghi is a DataFrame

clearsky_index = ghi.div(cs_ghi, axis=0)

But there are then some issues with broadcasting for the logical tests later on, and possibly formatting the output.

So again, this boils down to whether enough people use DataFrames as inputs that it’s worth going through that typing business within the pvlib function. I’m happy to look at how to make it be a little more type agnostic if you want to resolve it that way. If not, it might be worth documenting somehow that both ghi and cs_ghi must be 1-D for this to work.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
jranallicommented, Jul 11, 2022

Ok got a followup. I actually made this work for 3D arrays as well as all the tests you proposed. The trick is to replace my previous proposal with:

while np.ndim(ghi) > np.ndim(clearsky_ghi):
    clearsky_ghi = np.expand_dims(clearsky_ghi, axis=-1)

Then here’s a test on a 3D array, where the time is the “page” dimension, which is actually the first axis.

  ghi_td = np.array([[[100, 100], [100, 100]], [[200, 200], [200, 200]], [[500, 500], [500, 500]]])
  ghi_cs = np.array([500, 800, 1000])
  out = irradiance.clearsky_index(ghi_td, ghi_cs)
  expected = np.array([[[0.2, 0.2], [0.2, 0.2]], [[0.25, 0.25], [0.25, 0.25]], [[0.5, 0.5], [0.5, 0.5]]])

I put this in a branch, and would be happy to make it into a full pull request if you like this solution.

1reaction
wholmgrencommented, May 25, 2022

it’s worth going through that typing business within the pvlib function.

I think we’re leaning towards the other direction: avoid anything to do with types in functions like this. See #1455. I’m not aware of any broadcasting tricks that are compatible with both numpy and pandas that would solve this problem but maybe someone else is.

The code below solves the problem if you’re willing to call .to_frame() on the input Series to add another dimension, then .to_numpy() to strip the index.

You’d still be stuck with array output though. In many pvlib functions we use np.where because it can handle scalars (unlike boolean indexing), but it always returns a numpy array. That means that we instead need special code to handle Series (or anything else like DataFrame). Might be better to cast a scalar to an array, use boolean indexing in the function logic, then reduce the result to a scalar. Then the DataFrame in will result in a DataFrame out. Let’s see…

# modifications in pvlib.irradiance.clearsky_index
def clearsky_index(ghi, clearsky_ghi, max_clearsky_index=2.0):
    clearsky_index = ghi / clearsky_ghi
    if np.isscalar(clearsky_index):
        scalar_out = True
        clearsky_index = np.asarray(clearsky_index)
    else:
        scalar_out = False
    # set +inf, -inf, and nans to zero
    clearsky_index[~np.isfinite(clearsky_index)] = 0
    # but preserve nans in the input arrays
    input_is_nan = ~np.isfinite(ghi) | ~np.isfinite(clearsky_ghi)
    clearsky_index[input_is_nan] = np.nan

    clearsky_index = np.maximum(clearsky_index, 0)
    clearsky_index = np.minimum(clearsky_index, max_clearsky_index)

    if scalar_out:
        return clearsky_index.item()
    return clearsky_index


In [99]: irradiance.clearsky_index(.5, 1)
Out[99]: 0.5

index = pd.date_range('2022-10-10 10:00:00', '2022-10-10 12:00:00', freq='1h')

cs_ghi = pd.Series([1,1,1], index=index)

In [100]: ghi_df = pd.DataFrame([[0.5,0.5],[1, 1],[.75, .75]], columns=['A', 'B'], index=index)

In [101]: ghi_s = pd.Series([0.5, 1, .75], index=index)

In [104]: irradiance.clearsky_index(ghi_s, cs_ghi)
Out[104]:
2022-10-10 10:00:00    0.50
2022-10-10 11:00:00    1.00
2022-10-10 12:00:00    0.75
Freq: H, dtype: float64

In [107]: irradiance.clearsky_index(ghi_df, cs_ghi.to_frame().to_numpy())
Out[107]:
                        A     B
2022-10-10 10:00:00  0.50  0.50
2022-10-10 11:00:00  1.00  1.00
2022-10-10 12:00:00  0.75  0.75

In [109]: ghi_da = da.from_array(np.array([0.5, 1, .75]))

In [110]: ghi_cs_da = da.from_array(np.array([1, 1, 1]))

In [111]: irradiance.clearsky_index(ghi_da, ghi_cs_da)
Out[111]: dask.array<minimum, shape=(3,), dtype=float64, chunksize=(3,), chunktype=numpy.ndarray>

In [112]: irradiance.clearsky_index(ghi_da, ghi_cs_da).compute()
Out[112]: array([0.5 , 1.  , 0.75])

The special case for scalars is gross but probably worth it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Source code for pvlib.clearsky
The ``clearsky`` module contains several methods to calculate clear sky GHI, DNI, and DHI. """ import os from collections import OrderedDict import calendar ......
Read more >
Seasonal bias in PVLib forecast - python - Stack Overflow
I am trying to model a 12MW plant using PVLib to produce hourly power output forecasts for the next day. The system's outline...
Read more >
PVLIB_Python Documentation - Read the Docs
provided in the input weather DataFrame, so the ModelChain object defaults to 20 C and 0 m/s. Also, no irradiance.
Read more >
clear-sky broadband irradiance: Topics by Science.gov
Aerosols impact clear-sky surface irradiance () through the effects of scattering and absorption. Linear or nonlinear relationships between aerosol optical ...
Read more >
Points Solar Radiation (Spatial Analyst)—ArcGIS Pro
When using an input layer, the spatial reference of the data frame is used. Sky size is the resolution of the viewshed, sky...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found