Strategies for indexing and selecting
See original GitHub issueI’m working with NEMO output wrapped into an xarray dataset understood by xgcm.Grid
. (And it is great to finally be able to just focus on the calculation at hand without all the C-grid boilerplate that’s usually necessary to handle the data!)
In working with these data sets, I struggle with a simple way of subsetting the data across different dimensions belonging to the same axis. With a dataset like this:
print(ds)
<xarray.Dataset>
Dimensions: (t: 24, x_c: 720, x_r: 720, y_c: 509, y_r: 509, z_c: 46, z_l: 46)
Coordinates:
* z_c (z_c) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
* z_l (z_l) float64 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 ...
* y_c (y_c) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
* y_r (y_r) float64 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 ...
* x_c (x_c) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
* x_r (x_r) float64 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 ...
depth_c (z_c) float64 dask.array<shape=(46,), chunksize=(23,)>
depth_l (z_l) float64 dask.array<shape=(46,), chunksize=(23,)>
llat_cc (y_c, x_c) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
llat_cr (y_c, x_r) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
llat_rc (y_r, x_c) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
llat_rr (y_r, x_r) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
llon_cc (y_c, x_c) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
llon_cr (y_c, x_r) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
llon_rc (y_r, x_c) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
llon_rr (y_r, x_r) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
* t (t) datetime64[ns] 2008-01-16T12:00:00 2008-02-15 ...
Data variables:
T (t, z_c, y_c, x_c) float32 dask.array<shape=(24, 46, 509, 720), chunksize=(1, 23, 509, 720)>
U (t, z_c, y_c, x_r) float32 dask.array<shape=(24, 46, 509, 720), chunksize=(1, 23, 509, 720)>
V (t, z_c, y_r, x_c) float32 dask.array<shape=(24, 46, 509, 720), chunksize=(1, 23, 509, 720)>
720)>
where T
lives on the tracer grid points (t, z_c, y_c, x_c)
, U
lives on the zonal-velocity grid points (t, z_c, y_c, x_r)
, and V
lives on the meridional-velocity grid points (t, z_c, y_r, x_c)
, to sub-set to, e.g., the 30°S…30°N, I’d naively go for something like
sel_llat_cc = (abs(ds.coords["llat_cc"]) <= 30)
sel_llat_cr = (abs(ds.coords["llat_cr"]) <= 30)
sel_llat_rc = (abs(ds.coords["llat_rc"]) <= 30)
T_sel = ds.T.where(sel_llat_cc, drop=True)
U_sel = ds.T.where(sel_llat_cr, drop=True)
V_sel = ds.T.where(sel_llat_rc, drop=True)
but this not necessarily results in equally shaped T_sel
, U_sel
, and V_sel
.
Constructing the selectors for the U
, and V
grid from the tracer grid is a conceivable but not very elegant way:
sel_llat_cc = (abs(ds.coords["llat_cc"]) <= 30)
sel_llat_cr = ds.llat_cr.astype("bool")
sel_llat_cr.data = sel_llat_cc.data
sel_llat_rc = ds.llat_rc.astype("bool")
sel_llat_rc.data = sel_llat_rc.data
How do you usually solve this?
Issue Analytics
- State:
- Created 6 years ago
- Comments:12 (12 by maintainers)
Top GitHub Comments
FYI, I have a PR in progress (#93) which will bypass the need to set these attributes and just let you pass the axis information directly to
Grid
as a dictionary.Yes.