Contributor guide?
See original GitHub issueSo far I’ve been developing xESMF on my own. There are sporadic PRs (#23, #27), but I am actually not sure how to best handle them. Given the increasing community interest, it would be useful to talk about how people can contribute to xESMF.
Contributing to xarray is a good reference on the software engineering side (style, testing, documentation, bug fixes…). Here I am thinking more about the science & usability & algorithm sides that are specific to xESMF.
There are several types of contributions I can think of:
1. Contribution to examples and tutorials
This is the easiest one and is highly welcome. I am very interested in how people regrid all kinds of data in different Earth science disciplines (environmental, atmospheric, oceanic, land, remote sensing…). I often see xESMF being successfully used to deal with grid meshes that I haven’t seen before (e.g. the “tri-polar grid” #14).
An example can be just a Jupyter notebook, focusing one or more of the following aspects:
- An specific scientific application (e.g. converting CMIP5 data from multiple models grids to a common grid for comparison pangeo-data/pangeo#309)
- The type of grid mesh (e.g. WRF’s lambert conformal projection, MITgcm’s lat-lon-cap, or even the Yin-Yang grid that most people have little experience with. Grid meshes are fun!)
- The choice of algorithm (e.g. while
bilinear
andconservative
are most common, the nearest neighbor method is actually great for categorical data such as land type index)
Guideline for tutorials/examples:
- Full reproducibility is required. NCL has a wonderful page of regridding examples, but I have trouble running most scripts due to missing data. For xESMF doc, I only use data from
xr.tutorial.load_dataset()
or data computed on the fly. Small data used in the example (say less than 20 MB?) can be submitted and added to axESMF-data
repo, just like the xarray-data repo. For large data, a stable link must be provided. The Pangeo platform on GCP/AWS seems a good place for hosting large data. - A brief introduction to the scientific problem and why regridding is needed would be very useful.
2. Contribution to standalone, small utilities Many issues on GitHub belong to “small utilities” (e.g. #15, #16). They do not have a large impact on the core regridding, but are crucial for usability and user experience. Developing those small utilities is much easier than hacking the regridding core, and they often do not require ESMF/ESMPy knowledge. It is also much easier for me to handle dependencies.
General principles:
- The functionality should be closely related to regridding. An example is computing cell area, which is useful for checking mass conservation before/after conservative regridding. The computation of a certain grid mesh (unless extremely common one like regular lat-lon) had better go to examples, not utilities.
- Avoid complicated data structure. Stick to
xarray.Dataset
and numpy whenever possible. Compatibility with pure numpy arrays is encouraged. - Minimize dependency on other functions, especially other “small utilities”. xESMF is still young and the code base is in flux.
3. Contribution to core functionalities
I extremely welcome hard-core xarray/dask/ESMF/Pangeo developers to tackle some of the most challenging problems. For example:
- Out-of-core, parallel (#3), and even distributed (pangeo-data/pangeo#334, #26) regridding
- Unstructured grid (#18)
For those big questions, better discuss on GitHub before starting serious coding.
When & Where to start
I am still planing some significant refactor of the code base, to better support critical features, notably dask support #3, accept Dataset
#5, and retrieve weights #11. (It is slowly moving because xESMF is my personal, unfunded, side project😐. Have a lot of other projects in hand. My life would probably be easier if I write a GMD/JORS paper on it, so it can count towards my PhD…) At this stage, hacking the core might not be the best choice, because it is very likely to change (talking about internal code, not user API). Contributing examples & tutorials & use cases is the safe bet.
TODO:
- Add a Contributor Guide to online docs
Issue Analytics
- State:
- Created 5 years ago
- Reactions:3
- Comments:5 (1 by maintainers)
Top GitHub Comments
Something that may be of interest on the xESMF roadmap is to move the repo to another namespace. xarray-contrib comes to mind as an obvious option. This may help increase the likelihood of gaining outside contributors and gives the package a more elevated platform. This is really just semantics at this point but something to think about.
Note that we would be very happy to add more examples to xarray-data to round out our current set (which is quite limited).