Updates to `compute_persistence`?
See original GitHub issueI split our persistence computations into compute_persistence
(original) and compute_persistence_pm
(your new addition) in b912731318ea923b4e360aced470a7a21ac6743e. We should discuss our thoughts here.
The new method is not totally clear to me, particularly for a “reference” ensemble. In a reference ensemble, the DP system is initialized from one realization of a reconstruction. So here, the persistence forecast should be done to the full reconstruction (but should be subset to the time period the DP system covers). This is because the reference simulation is a continuous simulation. Subsetting that would cause discontinuities in the dynamics and would not accurately represent a true persistence forecast (i.e., next year is forecast for this year’s anomalies).
I think if you’re bootstrapping and are spinning off lead-time comparisons from a control, you can use your method. This is because each “initialization” has a self-contained time series you can compute persistence over with its own dynamics. This wouldn’t cause a jump in the time series like it would with a reference ensemble.
Thoughts?
ref:
def compute_persistence(ds, reference, nlags, metric='pearson_r', dim='time'):
"""
Computes the skill of a persistence forecast from a reference
(e.g., hindcast/assimilation) or control run.
This simply applies some metric on the input out to some lag. The user
should avoid computing persistence with prebuilt ACF functions in e.g.,
python, MATLAB, R as they tend to use FFT methods for speed but incorporate
error due to this.
Currently supported metrics for persistence:
* pearson_r
* rmse
* mse
* mae
Reference:
* Chapter 8 (Short-Term Climate Prediction) in
Van den Dool, Huug. Empirical methods in short-term climate prediction.
Oxford University Press, 2007.
Args:
ds (xarray object): The initialization years to get persistence from.
reference (xarray object): The reference time series.
nlags (int): Number of lags to compute persistence to.
metric (str): Metric name to apply at each lag for the persistence
computation. Default: 'pearson_r'
dim (str): Dimension over which to compute persistence forecast.
Default: 'ensemble'
Returns:
pers (xarray object): Results of persistence forecast with the input
metric applied.
"""
_check_xarray(reference)
metric = _get_metric_function(metric)
if metric not in [_pearson_r, _rmse, _mse, _mae]:
raise ValueError("""Please select between the following metrics:
'pearson_r',
'rmse',
'mse',
'mae'""")
plag = [] # holds results of persistence for each lag
inits = ds['initialization'].values
reference = reference.isel({dim: slice(0, -nlags)})
for lag in range(1, 1 + nlags):
ref = reference.sel({dim: inits + lag})
fct = reference.sel({dim: inits})
ref[dim] = fct[dim]
plag.append(metric(ref, fct, dim=dim))
pers = xr.concat(plag, 'time')
pers['time'] = np.arange(1, 1 + nlags)
return pers
Issue Analytics
- State:
- Created 4 years ago
- Comments:15
Top GitHub Comments
That’s fine if this is modified in your proposed version and works. I just edited the original version for my features.
The key is including this (or a modified version) within the loop:
This ensures that we use every data point possible for persistence. Does this updated version make sense? Previously, you trimmed off
nlags
from the control. That’s fine for something with 3000 data points. But for FOSI, we only have 61 at annual resolution. So this makes sure that at lag 1, it uses 60, lag 2 59, and so on. Rather than just 50 at all lags.@aaronspring, yes that’s fine.