question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to build a `Regridder` for a grid that does not fit in memory

See original GitHub issue

I have a grid that doesn’t fit in memory (121,865 x 182,437):

<xarray.DataArray (lat: 182437, lon: 121865)>
dask.array<truediv, shape=(182437, 121865), dtype=float64, chunksize=(1024, 1024), chunktype=numpy.ndarray>
Coordinates:
  * lat          (lat) float64 1.612e+06 1.612e+06 ... -2.136e+05 -2.136e+05
  * lon          (lon) float64 -2.228e+05 -2.228e+05 ... 9.968e+05 9.968e+05
    spatial_ref  int64 0

I’d like to regrid into one that does (200 x 300):

ds_out = xarray.Dataset(
    {
        'lon': (['lon'], np.linspace(minX, maxX, XDIM)),
        'lat': (['lat'], np.linspace(
            minY, maxY, int(XDIM * xy_ratio))
        )
    }
)

I understand that dask support is built into the library to enable out-of-core computations. But, when I try to build the Regridder object,

regridder = xesmf.Regridder(
    xarray.Dataset(
        {'ndvi': ndvi}
    ), 
    ds_out, 
    'bilinear'
)

I get the following error:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-47-4174658fbd49> in <module>
----> 1 regridder = xesmf.Regridder(
      2     xarray.Dataset(
      3         {'ndvi': ndvi.rename({'x': 'lon', 'y': 'lat'})}
      4     ), 
      5     ds_out,

/opt/conda/lib/python3.8/site-packages/xesmf/frontend.py in __init__(self, ds_in, ds_out, method, locstream_in, locstream_out, periodic, **kwargs)
    771             grid_in, shape_in, input_dims = ds_to_ESMFlocstream(ds_in)
    772         else:
--> 773             grid_in, shape_in, input_dims = ds_to_ESMFgrid(
    774                 ds_in, need_bounds=need_bounds, periodic=periodic
    775             )

/opt/conda/lib/python3.8/site-packages/xesmf/frontend.py in ds_to_ESMFgrid(ds, need_bounds, periodic, append)
    113     else:
    114         dim_names = None
--> 115     lon, lat = as_2d_mesh(np.asarray(lon), np.asarray(lat))
    116 
    117     if 'mask' in ds:

/opt/conda/lib/python3.8/site-packages/xesmf/frontend.py in as_2d_mesh(lon, lat)
     28         assert lon.shape == lat.shape, 'lon and lat should have same shape'
     29     elif (lon.ndim, lat.ndim) == (1, 1):
---> 30         lon, lat = np.meshgrid(lon, lat)
     31     else:
     32         raise ValueError('lon and lat should be both 1D or 2D')

<__array_function__ internals> in meshgrid(*args, **kwargs)

/opt/conda/lib/python3.8/site-packages/numpy/lib/function_base.py in meshgrid(copy, sparse, indexing, *xi)
   4299 
   4300     if copy:
-> 4301         output = [x.copy() for x in output]
   4302 
   4303     return output

/opt/conda/lib/python3.8/site-packages/numpy/lib/function_base.py in <listcomp>(.0)
   4299 
   4300     if copy:
-> 4301         output = [x.copy() for x in output]
   4302 
   4303     return output

MemoryError: Unable to allocate 166. GiB for an array with shape (182437, 121865) and data type float64

Is there a way to build the Regridder so that the input and output do not need to be read into memory in full?

Thank you very much for your consideration and for an awesome package!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
raphaeldussincommented, Sep 27, 2021

If the remapping died in xesmf to to memory limit, the only way to solve the issue is to throw more RAM at the problem and this usually means using more compute nodes so we can extend both the number of cores available but also memory. The example I was giving ran on several hundred cores.

On a different note, it seems that you are trying to reduce your data. I wonder if using xarray.coarsen as a first step, to say average over geographical bins, would be more appropriate than a bilinear interpolation.

0reactions
darribascommented, Sep 28, 2021

Thank you very much @raphaeldussin, coarsen worked for what I had in mind with that usecase!

I’m closing the issue as I consider it answered but feel free to re-open if you think it’ll be useful for some reason.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Solving large problems using HPC - xESMF - Read the Docs
But fear not, there are solutions to solve large regridding problems, ... If the weights you have generated don't fit into memory when...
Read more >
Regridding High Resolution Observations to a High ...
There are two main steps within xESMF: Set up the regridder, with the convention xe.Regridder(ds_in, ds_out, method).
Read more >
xESMF Documentation
Make a regridder by xe.Regridder(grid_in, grid_out, method). grid is just an xarray DataSet containing lat and lon values. In most cases, ' ...
Read more >
Using xesmf to efficiently regrid data to another resolution
Then we build the regridder. In this first instance, the weights are not saved to netcdf. Note that there are a few options...
Read more >
ESMF_regrid
This function regrids data from a source grid to a destination grid, and creates a weights file in the interim. The source and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found