Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to build a `Regridder` for a grid that does not fit in memory

See original GitHub issue

I have a grid that doesn’t fit in memory (121,865 x 182,437):

<xarray.DataArray (lat: 182437, lon: 121865)>
dask.array<truediv, shape=(182437, 121865), dtype=float64, chunksize=(1024, 1024), chunktype=numpy.ndarray>
Coordinates:
  * lat          (lat) float64 1.612e+06 1.612e+06 ... -2.136e+05 -2.136e+05
  * lon          (lon) float64 -2.228e+05 -2.228e+05 ... 9.968e+05 9.968e+05
    spatial_ref  int64 0

I’d like to regrid into one that does (200 x 300):

ds_out = xarray.Dataset(
    {
        'lon': (['lon'], np.linspace(minX, maxX, XDIM)),
        'lat': (['lat'], np.linspace(
            minY, maxY, int(XDIM * xy_ratio))
        )
    }
)

I understand that dask support is built into the library to enable out-of-core computations. But, when I try to build the Regridder object,

regridder = xesmf.Regridder(
    xarray.Dataset(
        {'ndvi': ndvi}
    ), 
    ds_out, 
    'bilinear'
)

I get the following error:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-47-4174658fbd49> in <module>
----> 1 regridder = xesmf.Regridder(
      2     xarray.Dataset(
      3         {'ndvi': ndvi.rename({'x': 'lon', 'y': 'lat'})}
      4     ), 
      5     ds_out,

/opt/conda/lib/python3.8/site-packages/xesmf/frontend.py in __init__(self, ds_in, ds_out, method, locstream_in, locstream_out, periodic, **kwargs)
    771             grid_in, shape_in, input_dims = ds_to_ESMFlocstream(ds_in)
    772         else:
--> 773             grid_in, shape_in, input_dims = ds_to_ESMFgrid(
    774                 ds_in, need_bounds=need_bounds, periodic=periodic
    775             )

/opt/conda/lib/python3.8/site-packages/xesmf/frontend.py in ds_to_ESMFgrid(ds, need_bounds, periodic, append)
    113     else:
    114         dim_names = None
--> 115     lon, lat = as_2d_mesh(np.asarray(lon), np.asarray(lat))
    116 
    117     if 'mask' in ds:

/opt/conda/lib/python3.8/site-packages/xesmf/frontend.py in as_2d_mesh(lon, lat)
     28         assert lon.shape == lat.shape, 'lon and lat should have same shape'
     29     elif (lon.ndim, lat.ndim) == (1, 1):
---> 30         lon, lat = np.meshgrid(lon, lat)
     31     else:
     32         raise ValueError('lon and lat should be both 1D or 2D')

<__array_function__ internals> in meshgrid(*args, **kwargs)

/opt/conda/lib/python3.8/site-packages/numpy/lib/function_base.py in meshgrid(copy, sparse, indexing, *xi)
   4299 
   4300     if copy:
-> 4301         output = [x.copy() for x in output]
   4302 
   4303     return output

/opt/conda/lib/python3.8/site-packages/numpy/lib/function_base.py in <listcomp>(.0)
   4299 
   4300     if copy:
-> 4301         output = [x.copy() for x in output]
   4302 
   4303     return output

MemoryError: Unable to allocate 166. GiB for an array with shape (182437, 121865) and data type float64

Is there a way to build the Regridder so that the input and output do not need to be read into memory in full?

Thank you very much for your consideration and for an awesome package!

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

raphaeldussincommented, Sep 27, 2021

If the remapping died in xesmf to to memory limit, the only way to solve the issue is to throw more RAM at the problem and this usually means using more compute nodes so we can extend both the number of cores available but also memory. The example I was giving ran on several hundred cores.

On a different note, it seems that you are trying to reduce your data. I wonder if using xarray.coarsen as a first step, to say average over geographical bins, would be more appropriate than a bilinear interpolation.

0reactions

darribascommented, Sep 28, 2021

Thank you very much @raphaeldussin, coarsen worked for what I had in mind with that usecase!

I’m closing the issue as I consider it answered but feel free to re-open if you think it’ll be useful for some reason.