`MaskedQuantity` breaking `np.nanmean` etc. methods in `aggregate_downsample`
See original GitHub issueDescription
A downstream issue lightkurve/lightkurve#1157 appears to have been triggered by the use of Masked
objects, probably following #11127. The same code that returned a timeseries
subclass with Quantity
columns, “masked” with np.nan
where applicable, with astropy 5.0rc1 is creating columns of MaskedQuantity
. This is breaking any code using functions like np.nanmean
that are currently not supported.
Expected behaviour
MaskedQuantity
should either fully support required array functions or not be used where still unsupported functions are required.
Actual behaviour
For a more minimal example that can also be constructed in 4.3 (i.e. the aggregate_downsample
failure is independently from its updates in 5.0) – note that the array functions do work for np.ma.array
, but the NaNs don’t seem to be correctly caught in the downsampling:
Steps to Reproduce
import numpy as np
from astropy import units as u
from astropy.time import Time
from astropy.timeseries import Timeseries, aggregate_downsample
from astropy.utils.masked import Masked
ma = np.ma.array([[1, 2, 13, 248, np.nan, np.nan, 303, 12, 21]], mask=np.array([0, 0, 0, 1, 0, 1, 1, 0, 0])).T
np.nanmean(ma)
9.8
np.nanmean(ma.data)
85.71428571428571
mq = Masked(ma) * u.m
np.nanmean(qa)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in nanmean
TypeError: no implementation found for 'numpy.nanmean' on types that implement __array_function__: [<class 'astropy.utils.masked.core.MaskedQuantity'>]
tsa = TimeSeries(time=Time(np.arange(9)+4e4, format='mjd'), data=ma)
tsq = TimeSeries(time=Time(np.arange(9)+4e4, format='mjd'), data=mq)
aggregate_downsample(tsa, time_bin_size=3*u.d)
<BinnedTimeSeries length=3>
time_bin_start time_bin_size col0
s
object float64 float64
-------------- ------------- -----------------
40000.0 259200.0 5.333333333333333
40002.99999991 259200.0 nan
40005.99999982 259200.0 16.5
aggregate_downsample(tsq, time_bin_size=3*u.d)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/lib/python3.10/site-packages/astropy/timeseries/downsample.py", line 127, in aggregate_downsample
data[unique_indices] = u.Quantity(reduceat(values.value, groups, aggregate_func),
File "/opt/lib/python3.10/site-packages/astropy/timeseries/downsample.py", line 28, in reduceat
result.append(function(array[indices[i]:indices[i+1]]))
File "<__array_function__ internals>", line 5, in nanmean
TypeError: no implementation found for 'numpy.nanmean' on types that implement __array_function__: [<class 'astropy.utils.masked.core.MaskedNDArray'>]
I am not sure how exactly the timeseries subclass in the downstream issue is instantiated, but the change to MaskedQuantity
occurred with 5.0rc1 without any other code changes (@barentsen may have more background on the read-in details). So the code should probably be more conservative in using Masked objects for now, or functions like aggregate_downsample
will need to escape those methods applying them either to the unmasked
instances or their numpy content.
np.nanmean(np.ma.array(mq.value.data, mask=mq.mask))
9.8
Better still would of course be to directly support those functions. A naïve attempt to just add them to function_helpers.MASKED_SAFE_FUNCTIONS
actually (sort of) worked for np.nanmedian
, but np.nanmean
, np.nansum
, np.nanprod` then fail with
File "/opt/lib/python3.10/site-packages/numpy/lib/nanfunctions.py", line 937, in nanmean
arr, mask = _replace_nan(a, 0)
File "/opt/lib/python3.10/site-packages/numpy/lib/nanfunctions.py", line 108, in _replace_nan
np.copyto(a, val, where=mask)
File "<__array_function__ internals>", line 5, in copyto
TypeError: no implementation found for 'numpy.copyto' on types that implement __array_function__: [<class 'astropy.utils.masked.core.MaskedNDArray'>]
so they will probably need a dedicated implementation – analogously to MaskedIterator.mean
?
System Details
Python: 3.8.12 / 3.9.8 / 3.10.0 Platform: macOS-10.14.6-x86_64-i386-64bit Astropy: 5.0rc1 Numpy: 1.21.2 Scipy: 1.7.1 Matplotlib: 3.5.0rc1
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:18 (18 by maintainers)
Top GitHub Comments
I’ll try to push my fix for the nanfunctions later today - it was mostly done…
I understand that as
sum([])
just as with all-masked arrays, so returning 0.0 looks logical to me. Likewise for nanmean, which would then turn into0.0 / len([]) = np.nan
.