question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Extend scope of `alignment="same_verifs"`

See original GitHub issue

The “same_verifs” alignment generates a list of times from verif that are present in forecast at any init but all leads. This list will always be empty when the init frequency is lower than the lead frequency. Is there scope to extend “same_verifs” to instead deal appropriately with such cases? I’ll try to give a concrete example of what I mean below.

Consider the following hindcasts:

import cftime
import climpred
import numpy as np
import xarray as xr

# Hindcasts initialised every year with monthly lead
init = xr.cftime_range(start="2000-01-01", end="2002-01-01", freq="AS")
lead = range(0, 24)
data = np.random.random((len(init), len(lead)))
hind = xr.DataArray(data, coords=[init, lead], dims=["init", "lead"], name="var")
hind["lead"].attrs["units"] = "months"
hind = climpred.utils.add_time_from_init_lead(hind)

I currently can’t use “same_verifs” with this data because there are no common times available at all leads.

But, users may still want to align based on a common verification period. I.e., in this example, "valid_time"s [2001-01-01 and 2002-01-01] are available at all possible leads for which they can occur (leads 0 and 12 months). Similarly,

  • [2001-02-01 and 2002-02-01] are available at leads 1 and 13 months,
  • [2001-03-01 and 2002-03-01] are available at leads 2 and 14 months, …
  • [2001-12-01 and 2002-12-01] are available at leads 11 and 23 months.

That is, by performing verification over the period 2001-01-01 - 2002-12-01 one includes:

  • the same dates at each lead where possible, given the init/lead frequencies
  • the same number of samples at each lead
period = [cftime.DatetimeGregorian(2001, 1, 1), cftime.DatetimeGregorian(2002, 12, 1)]

hind.where(
    np.logical_and(hind["valid_time"] >= period[0], hind["valid_time"] <= period[1])
).plot()
Screen Shot 2021-12-03 at 1 17 42 pm

How do folks feel about trying to restructure cftime.utils._same_verifs_alignment() to use the above alignment dates in the above example? We would obviously do this such that the current behaviour is preserved for datasets that have common verification times across all leads.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
dougiesquirecommented, Dec 6, 2021

Note: For your example to work you definitely need a monthly observation.

Yes exactly - sorry should’ve made that clearer

The number of sample isnt equal but wont differ more than +/- 1 IMO.

Good point. I messed that up, sorry. Now I realise there isn’t a single solution to the constraints I’ve posed.

I think there’d be value in an alignment something like what I’m suggesting. But it seems like I still need to work out the best approach for climpred in my head. In the past for my own work I’ve just specified a period over which to verify and kept all dates within that period. I chose this period judiciously to make sure that there are equal numbers of samples at each lead.

Happy to open a PR where I can flesh this out a little better. But it might take me a little while to get to it sorry.

0reactions
aaronspringcommented, Dec 6, 2021

Thanks @dougiesquire. Now I get your approach. So valid_times do not need to match across lead but is between on upper and lower bound. It reminds me a bit of sel(method='nearest') but with a upper and lower bound.

Note: For your example to work you definitely need a monthly observation.

So for your second example, striked do not verify:

init freq: 3 month, lead freq: 1 month:

lead 0 lead 1 lead 2 lead 3
2001-10 2001-11 2001-12 2002-01
2001-07 2001-08 2001-09 2001-10
2001-04 2001-05 2001-06 2001-07
2001-01 2001-02 2001-03 2001-04

The number of sample isnt equal but wont differ more than +/- 1 IMO. Taking 2001-03 - 2001-11 makes three sample each.

I’d still prefer to make a new alignment keyword. maybe same_verifs_nearest or same_verifs_fill?


Would you lead a PR? Entrypoint is https://github.com/pangeo-data/climpred/blob/f6e05d18e54b539e991820c5616b0206b5584617/climpred/alignment.py#L125 I am happy to give feedback and test.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Alignment Healthcare expanding into 2 new states for 2023
Medicare Advantage insurtech Alignment Healthcare will expand its reach into Texas and Florida, allowing it to reach an additional 1.1 million seniors.
Read more >
Top Five Causes of Scope Creep - PMI
Scope creep is one of the most prevalent causes of project failure. This paper examines the five most common causes of scope creep...
Read more >
Aligned by Default: The U.S. Extended Deterrence - YouTube
U.S. allies in NATO Europe and the Asia-Pacific (above all, Australia, Japan, and South Korea) have been “ aligned by default,” owing to ......
Read more >
Executive Order on Diversity, Equity, Inclusion, and ...
(a) reestablish a coordinated Government-wide initiative to promote diversity and inclusion in the Federal workforce, expand its scope to ...
Read more >
What is project scope? - TechTarget
Project scope is defined in the scope statement, a document that provides the objectives, schedules, tasks and deliverables of a project. Scope statements...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found