question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`MaskedQuantity` breaking `np.nanmean` etc. methods in `aggregate_downsample`

See original GitHub issue

Description

A downstream issue lightkurve/lightkurve#1157 appears to have been triggered by the use of Masked objects, probably following #11127. The same code that returned a timeseries subclass with Quantity columns, “masked” with np.nan where applicable, with astropy 5.0rc1 is creating columns of MaskedQuantity. This is breaking any code using functions like np.nanmean that are currently not supported.

Expected behaviour

MaskedQuantity should either fully support required array functions or not be used where still unsupported functions are required.

Actual behaviour

For a more minimal example that can also be constructed in 4.3 (i.e. the aggregate_downsample failure is independently from its updates in 5.0) – note that the array functions do work for np.ma.array, but the NaNs don’t seem to be correctly caught in the downsampling:

Steps to Reproduce

import numpy as np
from astropy import units as u
from astropy.time import Time
from astropy.timeseries import Timeseries, aggregate_downsample
from astropy.utils.masked import Masked

ma = np.ma.array([[1, 2, 13, 248, np.nan, np.nan, 303, 12, 21]], mask=np.array([0, 0, 0, 1, 0, 1, 1, 0, 0])).T
np.nanmean(ma)
9.8

np.nanmean(ma.data)
85.71428571428571

mq = Masked(ma) * u.m
np.nanmean(qa)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 5, in nanmean
TypeError: no implementation found for 'numpy.nanmean' on types that implement __array_function__: [<class 'astropy.utils.masked.core.MaskedQuantity'>]

tsa = TimeSeries(time=Time(np.arange(9)+4e4, format='mjd'), data=ma)
tsq = TimeSeries(time=Time(np.arange(9)+4e4, format='mjd'), data=mq)
aggregate_downsample(tsa, time_bin_size=3*u.d)
<BinnedTimeSeries length=3>
time_bin_start time_bin_size        col0      
                     s                        
    object        float64         float64     
-------------- ------------- -----------------
       40000.0      259200.0 5.333333333333333
40002.99999991      259200.0               nan
40005.99999982      259200.0              16.5

aggregate_downsample(tsq, time_bin_size=3*u.d)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/lib/python3.10/site-packages/astropy/timeseries/downsample.py", line 127, in aggregate_downsample
    data[unique_indices] = u.Quantity(reduceat(values.value, groups, aggregate_func),
  File "/opt/lib/python3.10/site-packages/astropy/timeseries/downsample.py", line 28, in reduceat
    result.append(function(array[indices[i]:indices[i+1]]))
  File "<__array_function__ internals>", line 5, in nanmean
TypeError: no implementation found for 'numpy.nanmean' on types that implement __array_function__: [<class 'astropy.utils.masked.core.MaskedNDArray'>]

I am not sure how exactly the timeseries subclass in the downstream issue is instantiated, but the change to MaskedQuantity occurred with 5.0rc1 without any other code changes (@barentsen may have more background on the read-in details). So the code should probably be more conservative in using Masked objects for now, or functions like aggregate_downsample will need to escape those methods applying them either to the unmasked instances or their numpy content.

np.nanmean(np.ma.array(mq.value.data, mask=mq.mask))
9.8

Better still would of course be to directly support those functions. A naïve attempt to just add them to function_helpers.MASKED_SAFE_FUNCTIONS actually (sort of) worked for np.nanmedian, but np.nanmean, np.nansum, np.nanprod` then fail with

File "/opt/lib/python3.10/site-packages/numpy/lib/nanfunctions.py", line 937, in nanmean
    arr, mask = _replace_nan(a, 0)
  File "/opt/lib/python3.10/site-packages/numpy/lib/nanfunctions.py", line 108, in _replace_nan
    np.copyto(a, val, where=mask)
  File "<__array_function__ internals>", line 5, in copyto
TypeError: no implementation found for 'numpy.copyto' on types that implement __array_function__: [<class 'astropy.utils.masked.core.MaskedNDArray'>]

so they will probably need a dedicated implementation – analogously to MaskedIterator.mean?

System Details

Python: 3.8.12 / 3.9.8 / 3.10.0 Platform: macOS-10.14.6-x86_64-i386-64bit Astropy: 5.0rc1 Numpy: 1.21.2 Scipy: 1.7.1 Matplotlib: 3.5.0rc1

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:18 (18 by maintainers)

github_iconTop GitHub Comments

1reaction
mhvkcommented, Nov 9, 2021

I’ll try to push my fix for the nanfunctions later today - it was mostly done…

1reaction
dhomeiercommented, Nov 8, 2021

Also, I am now seeing that np.nansum never returns NaN, not even for all-NaN arrays, so I guess it should not return Masked arrays either. Must admit I find this somewhat inconsistent…

I understand that as sum([]) just as with all-masked arrays, so returning 0.0 looks logical to me. Likewise for nanmean, which would then turn into 0.0 / len([]) = np.nan.

Read more comments on GitHub >

github_iconTop Results From Across the Web

numpy.nanmean — NumPy v1.24 Manual
Compute the arithmetic mean along the specified axis, ignoring NaNs. Returns the average of the array elements. The average is taken over the...
Read more >
Error in numpy nanmean - python - Stack Overflow
In my case, I had some inf and -inf values inside the array, which are valid float numbers, but would cause any mean()...
Read more >
Python | numpy.nanmean() function - GeeksforGeeks
numpy.nanmean() function can be used to calculate the mean of array ignoring the NaN value. If array have NaN value and we can...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found