Comparison with xarray's built-in interp()
See original GitHub issueRecently realized that xarray has a nice new interpolation module added by @crusaderky and @fujiisoup in pydata/xarray#2079 (also cc @shoyer, @rabernat).
It does have some overlap with xESMF (fortunately not too much), so I think it would be necessary to:
- Compare their advantages & limitations
- Identify their proper use cases
- Better define the development roadmap and avoid duplication of effort
interp()
wraps scipy.interpolate
; while xESMF wraps ESMPy’s regridding functionality. The subtle difference between “interpolation” and “regridding” is that, the former often refers to traditional 1D interpolation (sometimes N-D), mostly in Cartesian space; while the latter specifically means geospatial data on Earth’s sphere.
I personally think interp()
is a great fit for:
- Interpolation over 1D coordinate (e.g. vertical layers, time). I’ve been using Scipy for vertical interpolation, too. ESMPy does support 3D grid, but this is generally an overkill.
- Data that are not on Earth’s sphere. Say the output from any other physical models. ESMPy can actually handle Cartesian coordinates, but everyone seems to only use spherical coordinates
- High-dimensional regular grid (>=4D? rarely seen in Earth science but can occur in other physical sciences or machine learning). Seems like
interp
’s API tries to generalize to arbitrary dimensions, while xESMF is specific to horizontal regridding on the sphere. - Sampling over a trajectory via “Advanced Interpolation”. ESMPy does have a similar support via LocStream¶ but I personally don’t use it. Glad that xarray has native support for this feature.
For geospatial regridding tasks, xESMF has some important strengths. Many already reviewed in the docs, but more specifically:
- Performance, especially with large data. xESMF reuses weights but
scipy.interpolate
does not. This simple test shows that xESMF is 16x faster thaninterp()
, once the weights are computed (computing weights is also faster thaninterp
). Indeed, this performance gap will be narrowed down on distributed cloud platforms, where the I/O time dominates (pangeo-data/pangeo#334). - Curvilinear grid (pydata/xarray#2281). This is the major reason why I wrote xESMF…
- Conservative algorithm, to conserve the integral for density-like fields such as air density, heat flux, emission intensity… (This algorithm is used everywhere in Earth science but is never taught in numerical analysis classes. Scipy is basically “everything in a numerical analysis textbook”, so unsurprisingly it has no conservative scheme.)
In short, interp
is a general-purpose interpolation module; xESMF is a geospatial regridding package targeting at Earth science needs. Looks like their objectives can be distinguished. Should we consider merging some efforts? Or just perhaps let them evolve independently?
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (1 by maintainers)
Top GitHub Comments
I agree!
In the long term, we want an external interface that allows for extending xarray with custom index/grid objects as an explicit part of our data model, e.g., for geospatial indexing. This would allow for caching some indexing/regridding computations and potentially allow even for extending
interp
in xarray by third-party libraries.@dcherian I can work on this after I figure out some interpolation better than I currently do. It’s on my to-do list!