Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Use masked arrays while preserving int

See original GitHub issue

A great beauty of numpys masked arrays is that it works with any dtype, since it does not use nan. Unfortunately, when I try to put my data into an xarray.Dataset, it converts ints to float, as shown below:

In [137]: x = arange(30, dtype="i1").reshape(3, 10)

In [138]: xr.Dataset({"count": (["x", "y"], ma.masked_where(x%5>3, x))}, coords={"x": range(3), "y":
     ...: range(10)})
Out[138]:
<xarray.Dataset>
Dimensions:  (x: 3, y: 10)
Coordinates:
  * y        (y) int64 0 1 2 3 4 5 6 7 8 9
  * x        (x) int64 0 1 2
Data variables:
    count    (x, y) float64 0.0 1.0 2.0 3.0 nan 5.0 6.0 7.0 8.0 nan 10.0 ...

This happens in the function _maybe_promote.

Such type “promotion” is unaffordable for me; the memory consumption of my multi-gigabyte arrays would explode by a factor 4. Secondly, many of my integer-dtype fields are bit arrays, for which floating point representation is not desirable.

It would greatly benefit xarray if it could use masking while preserving the dtype of input data.

(See also: Stackoverflow question)

Issue Analytics

State:
Created 7 years ago
Reactions:2
Comments:9 (5 by maintainers)

Top GitHub Comments

4reactions

gerrithollcommented, Jan 24, 2019

@max-sixty Interesting! I wonder what it would take to make use of this “nullable integer data type” in xarray. It wouldn’t work to convert it to a standard numpy array (da.values) retaining the dtype, but one could make a new .to_maskedarray() method returning a numpy masked array; that would probably be easier than to add full support for masked arrays.

2reactions

gerrithollcommented, Jan 31, 2020

Pandas 1.0 uses pd.NA for integers, boolean, and string dtypes: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.0.0.html#experimental-na-scalar-to-denote-missing-values

Top Results From Across the Web

xarray with masked arrays while preserving integer dtypes

Unfortunately, xarray does not support masked arrays or any form of integer dtypes with missing values. The reasons for this choice are the ......

The numpy.ma module — NumPy v1.24 Manual

Masked arrays are arrays that may have missing or invalid entries. The numpy.ma module provides a nearly work-alike replacement for numpy that supports...

Fluent NumPy. Let's uncover the practical details of… - Medium

Iterating over Arrays: Using nditer Iterator · Masked Arrays. NumPy's main object is the homogeneous multidimensional array.

The numpy.ma module — NumPy v1.9 Manual

Constructing masked arrays¶ · A first possibility is to directly invoke the MaskedArray class. · A second possibility is to use the two...

Missing data: masked arrays

In cases where everything is done using floating point, so missing values could be handled with Nan, masked arrays incur a speed penalty....