How to build a `Regridder` for a grid that does not fit in memory
See original GitHub issueI have a grid that doesn’t fit in memory (121,865 x 182,437):
<xarray.DataArray (lat: 182437, lon: 121865)>
dask.array<truediv, shape=(182437, 121865), dtype=float64, chunksize=(1024, 1024), chunktype=numpy.ndarray>
Coordinates:
* lat (lat) float64 1.612e+06 1.612e+06 ... -2.136e+05 -2.136e+05
* lon (lon) float64 -2.228e+05 -2.228e+05 ... 9.968e+05 9.968e+05
spatial_ref int64 0
I’d like to regrid into one that does (200 x 300):
ds_out = xarray.Dataset(
{
'lon': (['lon'], np.linspace(minX, maxX, XDIM)),
'lat': (['lat'], np.linspace(
minY, maxY, int(XDIM * xy_ratio))
)
}
)
I understand that dask
support is built into the library to enable out-of-core computations. But, when I try to build the Regridder
object,
regridder = xesmf.Regridder(
xarray.Dataset(
{'ndvi': ndvi}
),
ds_out,
'bilinear'
)
I get the following error:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-47-4174658fbd49> in <module>
----> 1 regridder = xesmf.Regridder(
2 xarray.Dataset(
3 {'ndvi': ndvi.rename({'x': 'lon', 'y': 'lat'})}
4 ),
5 ds_out,
/opt/conda/lib/python3.8/site-packages/xesmf/frontend.py in __init__(self, ds_in, ds_out, method, locstream_in, locstream_out, periodic, **kwargs)
771 grid_in, shape_in, input_dims = ds_to_ESMFlocstream(ds_in)
772 else:
--> 773 grid_in, shape_in, input_dims = ds_to_ESMFgrid(
774 ds_in, need_bounds=need_bounds, periodic=periodic
775 )
/opt/conda/lib/python3.8/site-packages/xesmf/frontend.py in ds_to_ESMFgrid(ds, need_bounds, periodic, append)
113 else:
114 dim_names = None
--> 115 lon, lat = as_2d_mesh(np.asarray(lon), np.asarray(lat))
116
117 if 'mask' in ds:
/opt/conda/lib/python3.8/site-packages/xesmf/frontend.py in as_2d_mesh(lon, lat)
28 assert lon.shape == lat.shape, 'lon and lat should have same shape'
29 elif (lon.ndim, lat.ndim) == (1, 1):
---> 30 lon, lat = np.meshgrid(lon, lat)
31 else:
32 raise ValueError('lon and lat should be both 1D or 2D')
<__array_function__ internals> in meshgrid(*args, **kwargs)
/opt/conda/lib/python3.8/site-packages/numpy/lib/function_base.py in meshgrid(copy, sparse, indexing, *xi)
4299
4300 if copy:
-> 4301 output = [x.copy() for x in output]
4302
4303 return output
/opt/conda/lib/python3.8/site-packages/numpy/lib/function_base.py in <listcomp>(.0)
4299
4300 if copy:
-> 4301 output = [x.copy() for x in output]
4302
4303 return output
MemoryError: Unable to allocate 166. GiB for an array with shape (182437, 121865) and data type float64
Is there a way to build the Regridder
so that the input and output do not need to be read into memory in full?
Thank you very much for your consideration and for an awesome package!
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Solving large problems using HPC - xESMF - Read the Docs
But fear not, there are solutions to solve large regridding problems, ... If the weights you have generated don't fit into memory when...
Read more >Regridding High Resolution Observations to a High ...
There are two main steps within xESMF: Set up the regridder, with the convention xe.Regridder(ds_in, ds_out, method).
Read more >xESMF Documentation
Make a regridder by xe.Regridder(grid_in, grid_out, method). grid is just an xarray DataSet containing lat and lon values. In most cases, ' ...
Read more >Using xesmf to efficiently regrid data to another resolution
Then we build the regridder. In this first instance, the weights are not saved to netcdf. Note that there are a few options...
Read more >ESMF_regrid
This function regrids data from a source grid to a destination grid, and creates a weights file in the interim. The source and...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
If the remapping died in xesmf to to memory limit, the only way to solve the issue is to throw more RAM at the problem and this usually means using more compute nodes so we can extend both the number of cores available but also memory. The example I was giving ran on several hundred cores.
On a different note, it seems that you are trying to reduce your data. I wonder if using xarray.coarsen as a first step, to say average over geographical bins, would be more appropriate than a bilinear interpolation.
Thank you very much @raphaeldussin,
coarsen
worked for what I had in mind with that usecase!I’m closing the issue as I consider it answered but feel free to re-open if you think it’ll be useful for some reason.