question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add ability to change underlying array type

See original GitHub issue

Is your feature request related to a problem? Please describe.

In order to use Xarray with alternative array types like cupy the user needs to be able to specify the underlying array type without digging into internals.

Right now I’m doing something like this.

import xarray as xr
import cupy as cp

ds = xr.tutorial.load_dataset("air_temperature")
ds.air.data = cp.asarray(ds.air.data)

However this will become burdensome when there are many data arrays and feels brittle and prone to errors.

As I see it a conversion could instead be done in a couple of places; on load, or as a utility method.

Currently Xarray supports NumPy and Dask array well. Numpy is the defrault and the way you specify whether a Dask array should be used is to give the chunks kwargs to an open_ function or by calling .chunk() on a DataSet or DataArray.

Side note: There are a few places where the Dask array API bleeds into Xarray in order to have compatibility, the chunk kwarg/method is one, the .compute() method is another. I’m hesitant to do this for other array types, however surfacing the cupy.ndarray.get method could feel natural for cupy users. But for now I think it would be best to take Dask as a special case and try and be generic for everything else.

Describe the solution you’d like

For other array types I would like to propose the addition of an asarray kwarg for the open_ methods and an .asarray() method on DataSet and DataArray. This should take either the array type cupy.ndarray, the asarray method cp.asarray, or preferably either.

This would result in something like the following.

import xarray as xr
import cupy as cp

ds = xr.open_mfdataset("/path/to/files/*.nc", asarray=cp.ndarray)

# or

ds = xr.open_mfdataset("/path/to/files/*.nc")
gds = ds.asarray(cp.ndarray)

These operations would convert all data arrays to cupy arrays. For the case that ds is backed by Dask arrays it would use map_blocks to cast each block to the appropriate array type.

It is still unclear what to do about index variables, which are currently of type pandas.Index. For cupy it may be more appropriate to use a cudf.Index instead to ensure both are on the GPU. However this would add a dependency on cudf and potentially increase complexity here.

Describe alternatives you’ve considered

Instead of an asarray kwarg/method something like to_cupy/from_cupy could be done. However I feel this makes less sense because the object type is not changing, just that of the underlying data structure.

Another option would be to go more high level with it. For example a gpu kwarg and to_gpu/from_gpu method could be added in the same way. This would abstract things even further and give users a choice about hardware rather than software. This would also be a fine solution but I think it may special case too much and a more generic solution would be better.

Additional context Related to #4212.

I’m keen to start implementing this. But would like some discussion/feedback before I dive in here.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
jthielencommented, Jul 17, 2020

@jacobtomlinson Makes complete sense! Just wanted to make sure the option was considered as a possibility.

0reactions
dcheriancommented, Dec 5, 2020

The indexes story will change soon, we may even have our own index classes.

We should have pretty decent support for NEP-18 arrays in DataArray.data though, so IMO that’s the best thing to try out and see where the issues remain.

NEP35 is cool; looks like we should use it in our *_like functions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Change data type of given numpy array - GeeksforGeeks
Solution : We will use numpy.astype() function to change the data type of the underlying data of the given numpy array.
Read more >
How to prevent changing the value of array or object
You have to convert from array to List in the constructor and from List to array in getArray(). There is no need to...
Read more >
Array | Apple Developer Documentation
You can add new elements in the middle of an array by using the insert(_:at:) method for single elements and by using insert(contentsOf:at:)...
Read more >
Understanding Arrays and Slices in Go - DigitalOcean
Modifying Elements​​ We can use indexing to change elements within an array or slice by setting an index numbered element equal to a...
Read more >
Arrays, slices (and strings): The mechanics of 'append'
How Go arrays and slices work, and how to use copy and append.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found