Creating weights from multiple threads/processes fails (ESMC_GridCreateNoPeriDim)
See original GitHub issueWhat happened:
When using xesmf inside a parallel framework, an opaque error is raised. I’ve observed this behavior using dask’s threaded and distributed schedulers.
What you expected to happen:
I expected to be able to use xesmf within multiple processes. Or, if this is not possible, a descriptive error and/or documentation on the subject.
Minimal Complete Verifiable Example:
This simple example is just a slightly modified version of the basic example from the xesmf docs.
import numpy as np
import dask
import xesmf as xe
import xarray as xr
@dask.delayed
def regrid(tslice):
ds = xr.tutorial.open_dataset("air_temperature").isel(time=tslice)
ds_out = xr.Dataset(
{
"lat": (["lat"], np.arange(16, 75, 1.0)),
"lon": (["lon"], np.arange(200, 330, 1.5)),
}
)
regridder = xe.Regridder(ds, ds_out, "bilinear")
dr_out = regridder(ds)
return dr_out
tasks = [regrid(slice(0, 10)), regrid(slice(10, 20))]
# this works
dask.compute(tasks, scheduler='single-threaded')
# this fails
dask.compute(tasks, scheduler='threads')
The traceback is here:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_290/220165509.py in <module>
1 # this fails
----> 2 dask.compute(tasks, scheduler='threads')
/srv/conda/envs/notebook/lib/python3.9/site-packages/dask/base.py in compute(*args, **kwargs)
568 postcomputes.append(x.__dask_postcompute__())
569
--> 570 results = schedule(dsk, keys, **kwargs)
571 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
572
/srv/conda/envs/notebook/lib/python3.9/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
77 pool = MultiprocessingPoolExecutor(pool)
78
---> 79 results = get_async(
80 pool.submit,
81 pool._max_workers,
/srv/conda/envs/notebook/lib/python3.9/site-packages/dask/local.py in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
505 _execute_task(task, data) # Re-execute locally
506 else:
--> 507 raise_exception(exc, tb)
508 res, worker_id = loads(res_info)
509 state["cache"][key] = res
/srv/conda/envs/notebook/lib/python3.9/site-packages/dask/local.py in reraise(exc, tb)
313 if exc.__traceback__ is not tb:
314 raise exc.with_traceback(tb)
--> 315 raise exc
316
317
/srv/conda/envs/notebook/lib/python3.9/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
218 try:
219 task, data = loads(task_info)
--> 220 result = _execute_task(task, data)
221 id = get_id()
222 result = dumps((result, id))
/srv/conda/envs/notebook/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
117 # temporaries by their reference count and can execute certain
118 # operations in-place.
--> 119 return func(*(_execute_task(a, cache) for a in args))
120 elif not ishashable(arg):
121 return arg
/tmp/ipykernel_290/2192809839.py in regrid(tslice)
8 }
9 )
---> 10 regridder = xe.Regridder(ds, ds_out, "bilinear")
11 dr_out = regridder(ds)
12 return dr_out
/srv/conda/envs/notebook/lib/python3.9/site-packages/xesmf/frontend.py in __init__(self, ds_in, ds_out, method, locstream_in, locstream_out, periodic, **kwargs)
768 grid_in, shape_in, input_dims = ds_to_ESMFlocstream(ds_in)
769 else:
--> 770 grid_in, shape_in, input_dims = ds_to_ESMFgrid(
771 ds_in, need_bounds=need_bounds, periodic=periodic
772 )
/srv/conda/envs/notebook/lib/python3.9/site-packages/xesmf/frontend.py in ds_to_ESMFgrid(ds, need_bounds, periodic, append)
130 grid = Grid.from_xarray(lon.T, lat.T, periodic=periodic, mask=mask.T)
131 else:
--> 132 grid = Grid.from_xarray(lon.T, lat.T, periodic=periodic, mask=None)
133
134 if need_bounds:
/srv/conda/envs/notebook/lib/python3.9/site-packages/xesmf/backend.py in from_xarray(cls, lon, lat, periodic, mask)
109 # However, they actually need to be set explicitly,
110 # otherwise grid._coord_sys and grid._staggerloc will still be None.
--> 111 grid = cls(
112 np.array(lon.shape),
113 staggerloc=staggerloc,
/srv/conda/envs/notebook/lib/python3.9/site-packages/ESMF/util/decorators.py in new_func(*args, **kwargs)
79
80 esmp = esmpymanager.Manager(debug = False)
---> 81 return func(*args, **kwargs)
82 return new_func
83
/srv/conda/envs/notebook/lib/python3.9/site-packages/ESMF/api/grid.py in __init__(self, max_index, num_peri_dims, periodic_dim, pole_dim, coord_sys, coord_typekind, staggerloc, pole_kind, filename, filetype, reg_decomp, decompflag, is_sphere, add_corner_stagger, add_user_area, add_mask, varname, coord_names, tilesize, regDecompPTile, name)
408 self._struct = ESMP_GridStruct()
409 if self.num_peri_dims == 0:
--> 410 self._struct = ESMP_GridCreateNoPeriDim(self.max_index,
411 coordSys=coord_sys,
412 coordTypeKind=coord_typekind)
/srv/conda/envs/notebook/lib/python3.9/site-packages/ESMF/interface/cbindings.py in ESMP_GridCreateNoPeriDim(maxIndex, coordSys, coordTypeKind)
579 rc = lrc.value
580 if rc != constants._ESMP_SUCCESS:
--> 581 raise ValueError('ESMC_GridCreateNoPeriDim() failed with rc = '+str(rc)+
582 '. '+constants._errmsg)
583
ValueError: ESMC_GridCreateNoPeriDim() failed with rc = 545. Please check the log files (named "*ESMF_LogFile").
The ESMF_LogFile
includes the following lines:
20211217 034835.110 ERROR PET0 ESMCI_VM.C:2168 ESMCI::VM::getCurrent() Internal error: Bad condition - - Could not determine current VM
20211217 034835.110 ERROR PET0 ESMCI_VM_F.C:1105 c_esmc_vmgetcurrent() Internal error: Bad condition - Internal subroutine call returned Error
20211217 034835.110 ERROR PET0 ESMF_VM.F90:5579 ESMF_VMGetCurrent() Internal error: Bad condition - Internal subroutine call returned Error
20211217 034835.110 ERROR PET0 ESMF_Grid.F90:29430 ESMF_GridCreateDistgridReg Internal error: Bad condition - Internal subroutine call returned Error
20211217 034835.110 ERROR PET0 ESMF_Grid.F90:10874 ESMF_GridCreateNoPeriDimR Internal error: Bad condition - Internal subroutine call returned Error
20211217 034835.110 ERROR PET0 ESMF_Grid_C.F90:78 f_esmf_gridcreatenoperidim Internal error: Bad condition - Internal subroutine call returned Error
20211217 034835.110 ERROR PET0 ESMCI_Grid.C:259 ESMCI::Grid::createnoperidim() Internal error: Bad condition - Internal subroutine call returned Error
20211217 034835.110 ERROR PET0 ESMC_Grid.C:83 ESMC_GridCreateNoPeriDim() Internal error: Bad condition - Internal subroutine call returned Error
Anything else we need to know?:
xref: https://github.com/JiaweiZhuang/xESMF/issues/88
Environment:
Output of <tt>xr.show_versions() + xesmf + esmf</tt>
INSTALLED VERSIONS
commit: None python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.4.0-1062-azure machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: (‘en_US’, ‘UTF-8’) libhdf5: 1.12.1 libnetcdf: 4.8.1
xarray: 0.20.1 pandas: 1.3.4 numpy: 1.21.4 scipy: 1.7.3 netCDF4: 1.5.8 pydap: installed h5netcdf: 0.11.0 h5py: 3.6.0 Nio: None zarr: 2.10.3 cftime: 1.5.1.1 nc_time_axis: 1.4.0 PseudoNetCDF: None rasterio: 1.2.10 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.10.0 distributed: 2021.10.0 matplotlib: 3.5.0 cartopy: 0.20.1 seaborn: 0.11.2 numbagg: None fsspec: 2021.11.1 cupy: None pint: None sparse: 0.13.0 setuptools: 59.4.0 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: 7.30.1 sphinx: None xesmf: 0.6.2 ESMF: 8.2.0
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:7 (3 by maintainers)
Top GitHub Comments
@rokuingh can you please take a look
@rsdunlapiv - I’m not sure if this should work or not. If its not supposed to work, it may be nice to put a thread lock in place, or alternatively, raise a more informative error.
Also, I think it would be good to restate my intended parallel behavior here. I want to generate regridding weights for two datasets in parallel. I do not want ESMF to do anything in parallel (or with MPI).
I will say that my first example (most important) does work if called with the
multiprocessing
scheduler. However, the second does not: