Use masked arrays while preserving int
See original GitHub issueA great beauty of numpys masked arrays is that it works with any dtype, since it does not use nan
. Unfortunately, when I try to put my data into an xarray.Dataset
, it converts ints to float, as shown below:
In [137]: x = arange(30, dtype="i1").reshape(3, 10)
In [138]: xr.Dataset({"count": (["x", "y"], ma.masked_where(x%5>3, x))}, coords={"x": range(3), "y":
...: range(10)})
Out[138]:
<xarray.Dataset>
Dimensions: (x: 3, y: 10)
Coordinates:
* y (y) int64 0 1 2 3 4 5 6 7 8 9
* x (x) int64 0 1 2
Data variables:
count (x, y) float64 0.0 1.0 2.0 3.0 nan 5.0 6.0 7.0 8.0 nan 10.0 ...
This happens in the function _maybe_promote
.
Such type “promotion” is unaffordable for me; the memory consumption of my multi-gigabyte arrays would explode by a factor 4. Secondly, many of my integer-dtype fields are bit arrays, for which floating point representation is not desirable.
It would greatly benefit xarray
if it could use masking while preserving the dtype of input data.
(See also: Stackoverflow question)
Issue Analytics
- State:
- Created 7 years ago
- Reactions:2
- Comments:9 (5 by maintainers)
Top Results From Across the Web
xarray with masked arrays while preserving integer dtypes
Unfortunately, xarray does not support masked arrays or any form of integer dtypes with missing values. The reasons for this choice are the ......
Read more >The numpy.ma module — NumPy v1.24 Manual
Masked arrays are arrays that may have missing or invalid entries. The numpy.ma module provides a nearly work-alike replacement for numpy that supports...
Read more >Fluent NumPy. Let's uncover the practical details of… - Medium
Iterating over Arrays: Using nditer Iterator · Masked Arrays. NumPy's main object is the homogeneous multidimensional array.
Read more >The numpy.ma module — NumPy v1.9 Manual
Constructing masked arrays¶ · A first possibility is to directly invoke the MaskedArray class. · A second possibility is to use the two...
Read more >Missing data: masked arrays
In cases where everything is done using floating point, so missing values could be handled with Nan, masked arrays incur a speed penalty....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@max-sixty Interesting! I wonder what it would take to make use of this “nullable integer data type” in xarray. It wouldn’t work to convert it to a standard numpy array (
da.values
) retaining the dtype, but one could make a new.to_maskedarray()
method returning a numpy masked array; that would probably be easier than to add full support for masked arrays.Pandas 1.0 uses pd.NA for integers, boolean, and string dtypes: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.0.0.html#experimental-na-scalar-to-denote-missing-values