Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Strategies for indexing and selecting

See original GitHub issue

I’m working with NEMO output wrapped into an xarray dataset understood by xgcm.Grid. (And it is great to finally be able to just focus on the calculation at hand without all the C-grid boilerplate that’s usually necessary to handle the data!)

In working with these data sets, I struggle with a simple way of subsetting the data across different dimensions belonging to the same axis. With a dataset like this:

print(ds)

<xarray.Dataset>
Dimensions:   (t: 24, x_c: 720, x_r: 720, y_c: 509, y_r: 509, z_c: 46, z_l: 46)
Coordinates:
  * z_c       (z_c) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
  * z_l       (z_l) float64 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 ...
  * y_c       (y_c) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
  * y_r       (y_r) float64 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 ...
  * x_c       (x_c) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
  * x_r       (x_r) float64 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 ...
    depth_c   (z_c) float64 dask.array<shape=(46,), chunksize=(23,)>
    depth_l   (z_l) float64 dask.array<shape=(46,), chunksize=(23,)>
    llat_cc   (y_c, x_c) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
    llat_cr   (y_c, x_r) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
    llat_rc   (y_r, x_c) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
    llat_rr   (y_r, x_r) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
    llon_cc   (y_c, x_c) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
    llon_cr   (y_c, x_r) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
    llon_rc   (y_r, x_c) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
    llon_rr   (y_r, x_r) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
  * t         (t) datetime64[ns] 2008-01-16T12:00:00 2008-02-15 ...
Data variables:
    T  (t, z_c, y_c, x_c) float32 dask.array<shape=(24, 46, 509, 720), chunksize=(1, 23, 509, 720)>
    U  (t, z_c, y_c, x_r) float32 dask.array<shape=(24, 46, 509, 720), chunksize=(1, 23, 509, 720)>
    V  (t, z_c, y_r, x_c) float32 dask.array<shape=(24, 46, 509, 720), chunksize=(1, 23, 509, 720)>
720)>

where T lives on the tracer grid points (t, z_c, y_c, x_c), U lives on the zonal-velocity grid points (t, z_c, y_c, x_r), and V lives on the meridional-velocity grid points (t, z_c, y_r, x_c), to sub-set to, e.g., the 30°S…30°N, I’d naively go for something like

sel_llat_cc = (abs(ds.coords["llat_cc"]) <= 30)
sel_llat_cr = (abs(ds.coords["llat_cr"]) <= 30)
sel_llat_rc = (abs(ds.coords["llat_rc"]) <= 30)

T_sel = ds.T.where(sel_llat_cc, drop=True)
U_sel = ds.T.where(sel_llat_cr, drop=True)
V_sel = ds.T.where(sel_llat_rc, drop=True)

but this not necessarily results in equally shaped T_sel, U_sel, and V_sel.

Constructing the selectors for the U, and V grid from the tracer grid is a conceivable but not very elegant way:

sel_llat_cc = (abs(ds.coords["llat_cc"]) <= 30)

sel_llat_cr = ds.llat_cr.astype("bool")
sel_llat_cr.data = sel_llat_cc.data

sel_llat_rc = ds.llat_rc.astype("bool")
sel_llat_rc.data = sel_llat_rc.data

How do you usually solve this?

Issue Analytics

State:
Created 6 years ago
Comments:12 (12 by maintainers)

Top GitHub Comments

1reaction

rabernatcommented, Mar 16, 2018

FYI, I have a PR in progress (#93) which will bypass the need to set these attributes and just let you pass the axis information directly to Grid as a dictionary.

0reactions

willirathcommented, Jul 30, 2018

Yes.

Top Results From Across the Web

Indexing and selecting data — pandas 1.5.2 documentation

Indexing and selecting data#. The axis labeling information in pandas objects serves many purposes: Identifies data (i.e. provides metadata) using known ...

5 Selecting an Index Strategy

Create an index if you frequently want to retrieve less than 15% of the rows in a large table. The percentage varies greatly...

Indexing and Selecting Data with Pandas - GeeksforGeeks

Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows ...

Indexing strategy - IBM

There are two approaches to index creation: proactive and reactive. Proactive index creation involves anticipating which columns are most often used for ...

Indexing and Selecting - Pandas - YouTube

There should be one—and preferably only one—obvious way to do it,” — Zen of Python. I certainly wish that were the case with...