question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to broadcast along dayofyear

See original GitHub issue

Ok so here is the problem I’m trying to solve, for which I did not find any solution: I have a spatial dataset with the following format (tempeature over time on a spatial grid):

Dimensions:    (depth: 1, latitude: 481, longitude: 781, time: 730)
Coordinates:
  * time       (time) datetime64[ns] 2016-01-01T11:59:37.961193472 ...
  * longitude  (longitude) float32 -75.0 -74.9167 -74.8333 -74.75 -74.6667 ...
  * latitude   (latitude) float32 25.0 25.0833 25.1667 25.25 25.3333 25.4167 ...
  * depth      (depth) float32 0.494025
Data variables:
    thetao     (time, depth, latitude, longitude) float64 ...
Attributes:
    title:            daily mean fields from Global Ocean Physics Analysis an...
    institution:      MERCATOR OCEAN
    references:       http://www.mercator-ocean.fr
    source:           MERCATOR PSY4QV3R1
    Conventions:      CF-1.0
    history:          Data extracted from dataset http://opendap-glo.mercator...
    time_min:         578556.0
    time_max:         596052.0
    julian_day_unit:  hours since 1950-01-01 00:00:00
    z_min:            0.494024991989
    z_max:            0.494024991989
    latitude_min:     25.0
    latitude_max:     65.0
    longitude_min:    -75.0
    longitude_max:    -10.0

These all contain temperture values. From another source I receive specially calibrated mean and standard deviation of temperatures for every day of the year. The mean dataset (std is the same) looks like this:

Dimensions:    (dayofyear: 366, depth: 1, latitude: 481, longitude: 781)
Coordinates:
  * longitude  (longitude) float32 -75.0 -74.9167 -74.8333 -74.75 -74.6667 ...
  * latitude   (latitude) float32 25.0 25.0833 25.1667 25.25 25.3333 25.4167 ...
  * depth      (depth) float32 0.494025
  * dayofyear  (dayofyear) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
Data variables:
    thetao     (dayofyear, depth, latitude, longitude) float64 25.06 25.16 ...

What I want to achieve is to construct from the temperatures dataset a new one with standardized temperatures. The issue is I do not know how. The first thing I thought is to do something like:

new_dset = (dset.groupby("time.dateofyear") - mean) / std

However, the issue now is that I can’t undo the “groupby”. At this point I did not manage to figure out of how to either undo this, or in general the right approach for performing the operation I want.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:16 (10 by maintainers)

github_iconTop GitHub Comments

5reactions
spencerkclarkcommented, Sep 3, 2018

No worries @chiaral; I agree on the xarray side this isn’t so well documented (you have to follow the link to the pandas description of the datetime components).

Unfortunately there is not a simple attribute for grouping by matching month and day. It is possible to define your own vector of integers for this purpose, however. Perhaps you’ve already found a workaround, but just in case, here is one way to define a “modified ordinal day” that you can use in a groupby call:

In [1]: import xarray as xr

In [2]: from datetime import datetime

In [3]: dates = [datetime(1999, 1, 1), datetime(1999, 3, 1),
   ...:          datetime(2000, 1, 1), datetime(2000, 3, 1)]
   ...:

In [4]: da = xr.DataArray([1, 2, 3, 4], coords=[dates], dims=['time'])

In [5]: not_leap_year = xr.DataArray(~da.indexes['time'].is_leap_year, coords=da.coords)

In [6]: march_or_later = da.time.dt.month >= 3

In [7]: ordinal_day = da.time.dt.dayofyear

In [8]: modified_ordinal_day = ordinal_day + (not_leap_year & march_or_later)

In [9]: modified_ordinal_day = modified_ordinal_day.rename('modified_ordinal_day')

In [10]: modified_ordinal_day
Out[10]:
<xarray.DataArray 'modified_ordinal_day' (time: 4)>
array([ 1, 61,  1, 61])
Coordinates:
  * time     (time) datetime64[ns] 1999-01-01 1999-03-01 2000-01-01 2000-03-01

In [11]: da.groupby(modified_ordinal_day).mean('time')
Out[11]:
<xarray.DataArray (modified_ordinal_day: 2)>
array([2., 3.])
Coordinates:
  * modified_ordinal_day  (modified_ordinal_day) int64 1 61

Note if we use the standard ordinal day we get three groups, because of the difference between non-leap and leap years:

In [12]: ordinal_day
Out[12]:
<xarray.DataArray 'dayofyear' (time: 4)>
array([ 1, 60,  1, 61])
Coordinates:
  * time     (time) datetime64[ns] 1999-01-01 1999-03-01 2000-01-01 2000-03-01

In [13]: da.groupby(ordinal_day).mean('time')
Out[13]:
<xarray.DataArray (dayofyear: 3)>
array([2., 2., 4.])
Coordinates:
  * dayofyear  (dayofyear) int64 1 60 61
3reactions
spencerkclarkcommented, Sep 3, 2018

Building on the above example, if you’re OK with using a coordinate of strings, the following might be a little simpler way of defining the labels to use for grouping (this is perhaps closer to a single attribute solution):

In [14]: month_day_str = xr.DataArray(da.indexes['time'].strftime('%m-%d'), coords=da.coords,
    ...:                              name='month_day_str')
    ...:

In [15]: da.groupby(month_day_str).mean('time')
Out[15]:
<xarray.DataArray (month_day_str: 2)>
array([2., 3.])
Coordinates:
  * month_day_str  (month_day_str) object '01-01' '03-01'

Note #2090 / #2144 would make this more straightforward.

Read more comments on GitHub >

github_iconTop Results From Across the Web

xarray.DataArray.dt.dayofyear
Xarray is a fiscally sponsored project of NumFOCUS, a nonprofit dedicated to supporting the open-source scientific computing community. Theme by the Executable ...
Read more >
datetime - Convert Year/Month/Day to Day of Year in Python
Use datetime.timetuple() to convert your datetime object to a time.struct_time object then get its tm_yday property: from datetime import datetime ...
Read more >
Spark - Get a Day of Year and Week of the Year
Solution: Using the Spark SQL date_format() function along with date formatting patterns, we can extract a day of the year and week of...
Read more >
Statistical Downscaling and Bias-Adjustment - Advanced tools
Because we will perform adjustments on “dayofyear” groups (with a window), keeping standard calendars results in a extra “dayofyear” with only a quarter...
Read more >
Pandas: Convert year and day of year into a single datetime ...
Pandas: Convert year and day of year into a single datetime column of a dataframe. Last update on August 19 2022 21:51:41 (UTC/GMT...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found