question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Strategies for indexing and selecting

See original GitHub issue

I’m working with NEMO output wrapped into an xarray dataset understood by xgcm.Grid. (And it is great to finally be able to just focus on the calculation at hand without all the C-grid boilerplate that’s usually necessary to handle the data!)

In working with these data sets, I struggle with a simple way of subsetting the data across different dimensions belonging to the same axis. With a dataset like this:

print(ds)
<xarray.Dataset>
Dimensions:   (t: 24, x_c: 720, x_r: 720, y_c: 509, y_r: 509, z_c: 46, z_l: 46)
Coordinates:
  * z_c       (z_c) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
  * z_l       (z_l) float64 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 ...
  * y_c       (y_c) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
  * y_r       (y_r) float64 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 ...
  * x_c       (x_c) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
  * x_r       (x_r) float64 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 ...
    depth_c   (z_c) float64 dask.array<shape=(46,), chunksize=(23,)>
    depth_l   (z_l) float64 dask.array<shape=(46,), chunksize=(23,)>
    llat_cc   (y_c, x_c) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
    llat_cr   (y_c, x_r) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
    llat_rc   (y_r, x_c) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
    llat_rr   (y_r, x_r) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
    llon_cc   (y_c, x_c) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
    llon_cr   (y_c, x_r) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
    llon_rc   (y_r, x_c) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
    llon_rr   (y_r, x_r) float32 dask.array<shape=(509, 720), chunksize=(509, 720)>
  * t         (t) datetime64[ns] 2008-01-16T12:00:00 2008-02-15 ...
Data variables:
    T  (t, z_c, y_c, x_c) float32 dask.array<shape=(24, 46, 509, 720), chunksize=(1, 23, 509, 720)>
    U  (t, z_c, y_c, x_r) float32 dask.array<shape=(24, 46, 509, 720), chunksize=(1, 23, 509, 720)>
    V  (t, z_c, y_r, x_c) float32 dask.array<shape=(24, 46, 509, 720), chunksize=(1, 23, 509, 720)>
720)>

where T lives on the tracer grid points (t, z_c, y_c, x_c), U lives on the zonal-velocity grid points (t, z_c, y_c, x_r), and V lives on the meridional-velocity grid points (t, z_c, y_r, x_c), to sub-set to, e.g., the 30°S…30°N, I’d naively go for something like

sel_llat_cc = (abs(ds.coords["llat_cc"]) <= 30)
sel_llat_cr = (abs(ds.coords["llat_cr"]) <= 30)
sel_llat_rc = (abs(ds.coords["llat_rc"]) <= 30)

T_sel = ds.T.where(sel_llat_cc, drop=True)
U_sel = ds.T.where(sel_llat_cr, drop=True)
V_sel = ds.T.where(sel_llat_rc, drop=True)

but this not necessarily results in equally shaped T_sel, U_sel, and V_sel.

Constructing the selectors for the U, and V grid from the tracer grid is a conceivable but not very elegant way:

sel_llat_cc = (abs(ds.coords["llat_cc"]) <= 30)

sel_llat_cr = ds.llat_cr.astype("bool")
sel_llat_cr.data = sel_llat_cc.data

sel_llat_rc = ds.llat_rc.astype("bool")
sel_llat_rc.data = sel_llat_rc.data

How do you usually solve this?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
rabernatcommented, Mar 16, 2018

FYI, I have a PR in progress (#93) which will bypass the need to set these attributes and just let you pass the axis information directly to Grid as a dictionary.

0reactions
willirathcommented, Jul 30, 2018

Yes.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Indexing and selecting data — pandas 1.5.2 documentation
Indexing and selecting data#. The axis labeling information in pandas objects serves many purposes: Identifies data (i.e. provides metadata) using known ...
Read more >
5 Selecting an Index Strategy
Create an index if you frequently want to retrieve less than 15% of the rows in a large table. The percentage varies greatly...
Read more >
Indexing and Selecting Data with Pandas - GeeksforGeeks
Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows ...
Read more >
Indexing strategy - IBM
There are two approaches to index creation: proactive and reactive. Proactive index creation involves anticipating which columns are most often used for ...
Read more >
Indexing and Selecting - Pandas - YouTube
There should be one—and preferably only one—obvious way to do it,” — Zen of Python. I certainly wish that were the case with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found