Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Extend scope of `alignment="same_verifs"`

See original GitHub issue

The “same_verifs” alignment generates a list of times from verif that are present in forecast at any init but all leads. This list will always be empty when the init frequency is lower than the lead frequency. Is there scope to extend “same_verifs” to instead deal appropriately with such cases? I’ll try to give a concrete example of what I mean below.

Consider the following hindcasts:

import cftime
import climpred
import numpy as np
import xarray as xr

# Hindcasts initialised every year with monthly lead
init = xr.cftime_range(start="2000-01-01", end="2002-01-01", freq="AS")
lead = range(0, 24)
data = np.random.random((len(init), len(lead)))
hind = xr.DataArray(data, coords=[init, lead], dims=["init", "lead"], name="var")
hind["lead"].attrs["units"] = "months"
hind = climpred.utils.add_time_from_init_lead(hind)

I currently can’t use “same_verifs” with this data because there are no common times available at all leads.

But, users may still want to align based on a common verification period. I.e., in this example, "valid_time"s [2001-01-01 and 2002-01-01] are available at all possible leads for which they can occur (leads 0 and 12 months). Similarly,

[2001-02-01 and 2002-02-01] are available at leads 1 and 13 months,
[2001-03-01 and 2002-03-01] are available at leads 2 and 14 months, …
[2001-12-01 and 2002-12-01] are available at leads 11 and 23 months.

That is, by performing verification over the period 2001-01-01 - 2002-12-01 one includes:

the same dates at each lead where possible, given the init/lead frequencies
the same number of samples at each lead

period = [cftime.DatetimeGregorian(2001, 1, 1), cftime.DatetimeGregorian(2002, 12, 1)]

hind.where(
    np.logical_and(hind["valid_time"] >= period[0], hind["valid_time"] <= period[1])
).plot()

How do folks feel about trying to restructure cftime.utils._same_verifs_alignment() to use the above alignment dates in the above example? We would obviously do this such that the current behaviour is preserved for datasets that have common verification times across all leads.

Issue Analytics

State:
Created 2 years ago
Comments:6

Top GitHub Comments

1reaction

dougiesquirecommented, Dec 6, 2021

Note: For your example to work you definitely need a monthly observation.

Yes exactly - sorry should’ve made that clearer

The number of sample isnt equal but wont differ more than +/- 1 IMO.

Good point. I messed that up, sorry. Now I realise there isn’t a single solution to the constraints I’ve posed.

I think there’d be value in an alignment something like what I’m suggesting. But it seems like I still need to work out the best approach for climpred in my head. In the past for my own work I’ve just specified a period over which to verify and kept all dates within that period. I chose this period judiciously to make sure that there are equal numbers of samples at each lead.

Happy to open a PR where I can flesh this out a little better. But it might take me a little while to get to it sorry.

0reactions

aaronspringcommented, Dec 6, 2021

Thanks @dougiesquire. Now I get your approach. So valid_times do not need to match across lead but is between on upper and lower bound. It reminds me a bit of sel(method='nearest') but with a upper and lower bound.

Note: For your example to work you definitely need a monthly observation.

So for your second example, ~~striked~~ do not verify:

init freq: 3 month, lead freq: 1 month:

lead 0	lead 1	lead 2	lead 3
2001-10	~~2001-11~~	~~2001-12~~	~~2002-01~~
2001-07	2001-08	2001-09	2001-10
2001-04	2001-05	2001-06	2001-07
~~2001-01~~	~~2001-02~~	~~2001-03~~	2001-04

The number of sample isnt equal but wont differ more than +/- 1 IMO. Taking 2001-03 - 2001-11 makes three sample each.

I’d still prefer to make a new alignment keyword. maybe same_verifs_nearest or same_verifs_fill?

Would you lead a PR? Entrypoint is https://github.com/pangeo-data/climpred/blob/f6e05d18e54b539e991820c5616b0206b5584617/climpred/alignment.py#L125 I am happy to give feedback and test.

Top Results From Across the Web

Alignment Healthcare expanding into 2 new states for 2023

Medicare Advantage insurtech Alignment Healthcare will expand its reach into Texas and Florida, allowing it to reach an additional 1.1 million seniors.

Top Five Causes of Scope Creep - PMI

Scope creep is one of the most prevalent causes of project failure. This paper examines the five most common causes of scope creep...

Aligned by Default: The U.S. Extended Deterrence - YouTube

U.S. allies in NATO Europe and the Asia-Pacific (above all, Australia, Japan, and South Korea) have been “ aligned by default,” owing to ......

Executive Order on Diversity, Equity, Inclusion, and ...

(a) reestablish a coordinated Government-wide initiative to promote diversity and inclusion in the Federal workforce, expand its scope to ...

What is project scope? - TechTarget

Project scope is defined in the scope statement, a document that provides the objectives, schedules, tasks and deliverables of a project. Scope statements...