Extend scope of `alignment="same_verifs"`
See original GitHub issueThe “same_verifs” alignment generates a list of times from verif
that are present in forecast
at any init but all leads. This list will always be empty when the init frequency is lower than the lead frequency. Is there scope to extend “same_verifs” to instead deal appropriately with such cases? I’ll try to give a concrete example of what I mean below.
Consider the following hindcasts:
import cftime
import climpred
import numpy as np
import xarray as xr
# Hindcasts initialised every year with monthly lead
init = xr.cftime_range(start="2000-01-01", end="2002-01-01", freq="AS")
lead = range(0, 24)
data = np.random.random((len(init), len(lead)))
hind = xr.DataArray(data, coords=[init, lead], dims=["init", "lead"], name="var")
hind["lead"].attrs["units"] = "months"
hind = climpred.utils.add_time_from_init_lead(hind)
I currently can’t use “same_verifs” with this data because there are no common times available at all leads.
But, users may still want to align based on a common verification period. I.e., in this example, "valid_time"s [2001-01-01 and 2002-01-01] are available at all possible leads for which they can occur (leads 0 and 12 months). Similarly,
- [2001-02-01 and 2002-02-01] are available at leads 1 and 13 months,
- [2001-03-01 and 2002-03-01] are available at leads 2 and 14 months, …
- [2001-12-01 and 2002-12-01] are available at leads 11 and 23 months.
That is, by performing verification over the period 2001-01-01 - 2002-12-01 one includes:
- the same dates at each lead where possible, given the init/lead frequencies
- the same number of samples at each lead
period = [cftime.DatetimeGregorian(2001, 1, 1), cftime.DatetimeGregorian(2002, 12, 1)]
hind.where(
np.logical_and(hind["valid_time"] >= period[0], hind["valid_time"] <= period[1])
).plot()
How do folks feel about trying to restructure cftime.utils._same_verifs_alignment()
to use the above alignment dates in the above example? We would obviously do this such that the current behaviour is preserved for datasets that have common verification times across all leads.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6
Top GitHub Comments
Yes exactly - sorry should’ve made that clearer
Good point. I messed that up, sorry. Now I realise there isn’t a single solution to the constraints I’ve posed.
I think there’d be value in an alignment something like what I’m suggesting. But it seems like I still need to work out the best approach for
climpred
in my head. In the past for my own work I’ve just specified a period over which to verify and kept all dates within that period. I chose this period judiciously to make sure that there are equal numbers of samples at each lead.Happy to open a PR where I can flesh this out a little better. But it might take me a little while to get to it sorry.
Thanks @dougiesquire. Now I get your approach. So
valid_time
s do not need to match acrosslead
but is between on upper and lower bound. It reminds me a bit ofsel(method='nearest')
but with a upper and lower bound.Note: For your example to work you definitely need a monthly
observation
.So for your second example,
strikeddo not verify:init
freq: 3 month,lead
freq: 1 month:2001-112001-122002-012001-012001-022001-03The number of sample isnt equal but wont differ more than +/- 1 IMO. Taking
2001-03
-2001-11
makes three sample each.I’d still prefer to make a new
alignment
keyword. maybesame_verifs_nearest
orsame_verifs_fill
?Would you lead a PR? Entrypoint is https://github.com/pangeo-data/climpred/blob/f6e05d18e54b539e991820c5616b0206b5584617/climpred/alignment.py#L125 I am happy to give feedback and test.