question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Creating weights from multiple threads/processes fails (ESMC_GridCreateNoPeriDim)

See original GitHub issue

What happened:

When using xesmf inside a parallel framework, an opaque error is raised. I’ve observed this behavior using dask’s threaded and distributed schedulers.

What you expected to happen:

I expected to be able to use xesmf within multiple processes. Or, if this is not possible, a descriptive error and/or documentation on the subject.

Minimal Complete Verifiable Example:

This simple example is just a slightly modified version of the basic example from the xesmf docs.

import numpy as np
import dask
import xesmf as xe
import xarray as xr


@dask.delayed
def regrid(tslice):
    ds = xr.tutorial.open_dataset("air_temperature").isel(time=tslice)
    ds_out = xr.Dataset(
        {
            "lat": (["lat"], np.arange(16, 75, 1.0)),
            "lon": (["lon"], np.arange(200, 330, 1.5)),
        }
    )
    regridder = xe.Regridder(ds, ds_out, "bilinear")
    dr_out = regridder(ds)
    return dr_out


tasks = [regrid(slice(0, 10)), regrid(slice(10, 20))]

# this works
dask.compute(tasks, scheduler='single-threaded')

# this fails
dask.compute(tasks, scheduler='threads')

The traceback is here:


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_290/220165509.py in <module>
      1 # this fails
----> 2 dask.compute(tasks, scheduler='threads')

/srv/conda/envs/notebook/lib/python3.9/site-packages/dask/base.py in compute(*args, **kwargs)
    568         postcomputes.append(x.__dask_postcompute__())
    569 
--> 570     results = schedule(dsk, keys, **kwargs)
    571     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    572 

/srv/conda/envs/notebook/lib/python3.9/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
     77             pool = MultiprocessingPoolExecutor(pool)
     78 
---> 79     results = get_async(
     80         pool.submit,
     81         pool._max_workers,

/srv/conda/envs/notebook/lib/python3.9/site-packages/dask/local.py in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
    505                             _execute_task(task, data)  # Re-execute locally
    506                         else:
--> 507                             raise_exception(exc, tb)
    508                     res, worker_id = loads(res_info)
    509                     state["cache"][key] = res

/srv/conda/envs/notebook/lib/python3.9/site-packages/dask/local.py in reraise(exc, tb)
    313     if exc.__traceback__ is not tb:
    314         raise exc.with_traceback(tb)
--> 315     raise exc
    316 
    317 

/srv/conda/envs/notebook/lib/python3.9/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
    218     try:
    219         task, data = loads(task_info)
--> 220         result = _execute_task(task, data)
    221         id = get_id()
    222         result = dumps((result, id))

/srv/conda/envs/notebook/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
    117         # temporaries by their reference count and can execute certain
    118         # operations in-place.
--> 119         return func(*(_execute_task(a, cache) for a in args))
    120     elif not ishashable(arg):
    121         return arg

/tmp/ipykernel_290/2192809839.py in regrid(tslice)
      8         }
      9     )
---> 10     regridder = xe.Regridder(ds, ds_out, "bilinear")
     11     dr_out = regridder(ds)
     12     return dr_out

/srv/conda/envs/notebook/lib/python3.9/site-packages/xesmf/frontend.py in __init__(self, ds_in, ds_out, method, locstream_in, locstream_out, periodic, **kwargs)
    768             grid_in, shape_in, input_dims = ds_to_ESMFlocstream(ds_in)
    769         else:
--> 770             grid_in, shape_in, input_dims = ds_to_ESMFgrid(
    771                 ds_in, need_bounds=need_bounds, periodic=periodic
    772             )

/srv/conda/envs/notebook/lib/python3.9/site-packages/xesmf/frontend.py in ds_to_ESMFgrid(ds, need_bounds, periodic, append)
    130         grid = Grid.from_xarray(lon.T, lat.T, periodic=periodic, mask=mask.T)
    131     else:
--> 132         grid = Grid.from_xarray(lon.T, lat.T, periodic=periodic, mask=None)
    133 
    134     if need_bounds:

/srv/conda/envs/notebook/lib/python3.9/site-packages/xesmf/backend.py in from_xarray(cls, lon, lat, periodic, mask)
    109         # However, they actually need to be set explicitly,
    110         # otherwise grid._coord_sys and grid._staggerloc will still be None.
--> 111         grid = cls(
    112             np.array(lon.shape),
    113             staggerloc=staggerloc,

/srv/conda/envs/notebook/lib/python3.9/site-packages/ESMF/util/decorators.py in new_func(*args, **kwargs)
     79 
     80         esmp = esmpymanager.Manager(debug = False)
---> 81         return func(*args, **kwargs)
     82     return new_func
     83 

/srv/conda/envs/notebook/lib/python3.9/site-packages/ESMF/api/grid.py in __init__(self, max_index, num_peri_dims, periodic_dim, pole_dim, coord_sys, coord_typekind, staggerloc, pole_kind, filename, filetype, reg_decomp, decompflag, is_sphere, add_corner_stagger, add_user_area, add_mask, varname, coord_names, tilesize, regDecompPTile, name)
    408             self._struct = ESMP_GridStruct()
    409             if self.num_peri_dims == 0:
--> 410                 self._struct = ESMP_GridCreateNoPeriDim(self.max_index,
    411                                                        coordSys=coord_sys,
    412                                                        coordTypeKind=coord_typekind)

/srv/conda/envs/notebook/lib/python3.9/site-packages/ESMF/interface/cbindings.py in ESMP_GridCreateNoPeriDim(maxIndex, coordSys, coordTypeKind)
    579     rc = lrc.value
    580     if rc != constants._ESMP_SUCCESS:
--> 581         raise ValueError('ESMC_GridCreateNoPeriDim() failed with rc = '+str(rc)+
    582                         '.    '+constants._errmsg)
    583 

ValueError: ESMC_GridCreateNoPeriDim() failed with rc = 545.    Please check the log files (named "*ESMF_LogFile").

The ESMF_LogFile includes the following lines:

20211217 034835.110 ERROR            PET0 ESMCI_VM.C:2168 ESMCI::VM::getCurrent() Internal error: Bad condition  - - Could not determine current VM
20211217 034835.110 ERROR            PET0 ESMCI_VM_F.C:1105 c_esmc_vmgetcurrent() Internal error: Bad condition  - Internal subroutine call returned Error
20211217 034835.110 ERROR            PET0 ESMF_VM.F90:5579 ESMF_VMGetCurrent() Internal error: Bad condition  - Internal subroutine call returned Error
20211217 034835.110 ERROR            PET0 ESMF_Grid.F90:29430 ESMF_GridCreateDistgridReg Internal error: Bad condition  - Internal subroutine call returned Error
20211217 034835.110 ERROR            PET0 ESMF_Grid.F90:10874 ESMF_GridCreateNoPeriDimR Internal error: Bad condition  - Internal subroutine call returned Error
20211217 034835.110 ERROR            PET0 ESMF_Grid_C.F90:78 f_esmf_gridcreatenoperidim Internal error: Bad condition  - Internal subroutine call returned Error
20211217 034835.110 ERROR            PET0 ESMCI_Grid.C:259 ESMCI::Grid::createnoperidim() Internal error: Bad condition  - Internal subroutine call returned Error
20211217 034835.110 ERROR            PET0 ESMC_Grid.C:83 ESMC_GridCreateNoPeriDim() Internal error: Bad condition  - Internal subroutine call returned Error

Anything else we need to know?:

xref: https://github.com/JiaweiZhuang/xESMF/issues/88

Environment:

Output of <tt>xr.show_versions() + xesmf + esmf</tt>

INSTALLED VERSIONS

commit: None python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.4.0-1062-azure machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: (‘en_US’, ‘UTF-8’) libhdf5: 1.12.1 libnetcdf: 4.8.1

xarray: 0.20.1 pandas: 1.3.4 numpy: 1.21.4 scipy: 1.7.3 netCDF4: 1.5.8 pydap: installed h5netcdf: 0.11.0 h5py: 3.6.0 Nio: None zarr: 2.10.3 cftime: 1.5.1.1 nc_time_axis: 1.4.0 PseudoNetCDF: None rasterio: 1.2.10 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.10.0 distributed: 2021.10.0 matplotlib: 3.5.0 cartopy: 0.20.1 seaborn: 0.11.2 numbagg: None fsspec: 2021.11.1 cupy: None pint: None sparse: 0.13.0 setuptools: 59.4.0 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: 7.30.1 sphinx: None xesmf: 0.6.2 ESMF: 8.2.0

cc @rokuingh, @norlandrhagen, @theurich

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
rsdunlapivcommented, Dec 17, 2021

@rokuingh can you please take a look

0reactions
jhammancommented, Dec 20, 2021

@rsdunlapiv - I’m not sure if this should work or not. If its not supposed to work, it may be nice to put a thread lock in place, or alternatively, raise a more informative error.

Also, I think it would be good to restate my intended parallel behavior here. I want to generate regridding weights for two datasets in parallel. I do not want ESMF to do anything in parallel (or with MPI).

I will say that my first example (most important) does work if called with the multiprocessing scheduler. However, the second does not:

...
/srv/conda/envs/notebook/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py in dump()
    600     def dump(self, obj):
    601         try:
--> 602             return Pickler.dump(self, obj)
    603         except RuntimeError as e:
    604             if "recursion" in e.args[0]:

ValueError: ctypes objects containing pointers cannot be pickled
Read more comments on GitHub >

github_iconTop Results From Across the Web

Producer - Consumer Problem in Multi-Threading - YouTube
Source code can be found here:https://code-vault.net/lesson/tlu0jq32v9:1609364042686===== Support us through our store ...
Read more >
Threads vs. Processes: A Look At How They Work Within Your ...
You've probably heard of threads and processes before but you may not know how they work within a program. It's time to take...
Read more >
Why Too Many Threads Hurts Performance, and What to do ...
In fact, having too many threads can bog down a program. This article discusses why and how task-based programming avoids the problem.
Read more >
Multiple threads of control — Pythonista Documentation
This module provides low-level primitives for working with multiple threads (also called light-weight processes or tasks) — multiple threads of control ...
Read more >
6.2. Processes vs. Threads — Computer Systems Fundamentals
6.2.1. Multithreading¶. Multithreaded processes have multiple threads that perform tasks concurrently. Just like the thread that runs the code in main ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found