Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Updates to `compute_persistence`?

See original GitHub issue

I split our persistence computations into compute_persistence (original) and compute_persistence_pm (your new addition) in b912731318ea923b4e360aced470a7a21ac6743e. We should discuss our thoughts here.

The new method is not totally clear to me, particularly for a “reference” ensemble. In a reference ensemble, the DP system is initialized from one realization of a reconstruction. So here, the persistence forecast should be done to the full reconstruction (but should be subset to the time period the DP system covers). This is because the reference simulation is a continuous simulation. Subsetting that would cause discontinuities in the dynamics and would not accurately represent a true persistence forecast (i.e., next year is forecast for this year’s anomalies).

I think if you’re bootstrapping and are spinning off lead-time comparisons from a control, you can use your method. This is because each “initialization” has a self-contained time series you can compute persistence over with its own dynamics. This wouldn’t cause a jump in the time series like it would with a reference ensemble.

Thoughts?

ref:

def compute_persistence(ds, reference, nlags, metric='pearson_r', dim='time'):
    """
    Computes the skill of  a persistence forecast from a reference
    (e.g., hindcast/assimilation) or control run.
    This simply applies some metric on the input out to some lag. The user
    should avoid computing persistence with prebuilt ACF functions in e.g.,
    python, MATLAB, R as they tend to use FFT methods for speed but incorporate
    error due to this.
    Currently supported metrics for persistence:
    * pearson_r
    * rmse
    * mse
    * mae
    Reference:
    * Chapter 8 (Short-Term Climate Prediction) in
        Van den Dool, Huug. Empirical methods in short-term climate prediction.
        Oxford University Press, 2007.
    Args:
        ds (xarray object): The initialization years to get persistence from.
        reference (xarray object): The reference time series.
        nlags (int): Number of lags to compute persistence to.
        metric (str): Metric name to apply at each lag for the persistence
                      computation. Default: 'pearson_r'
        dim (str): Dimension over which to compute persistence forecast.
                   Default: 'ensemble'
    Returns:
        pers (xarray object): Results of persistence forecast with the input
                              metric applied.
    """
    _check_xarray(reference)
    metric = _get_metric_function(metric)
    if metric not in [_pearson_r, _rmse, _mse, _mae]:
        raise ValueError("""Please select between the following metrics:
            'pearson_r',
            'rmse',
            'mse',
            'mae'""")
    plag = []  # holds results of persistence for each lag
    inits = ds['initialization'].values
    reference = reference.isel({dim: slice(0, -nlags)})
    for lag in range(1, 1 + nlags):
        ref = reference.sel({dim: inits + lag})
        fct = reference.sel({dim: inits})
        ref[dim] = fct[dim]
        plag.append(metric(ref, fct, dim=dim))
    pers = xr.concat(plag, 'time')
    pers['time'] = np.arange(1, 1 + nlags)
    return pers

Issue Analytics

State:
Created 4 years ago
Comments:15

Top GitHub Comments

1reaction

bradyrxcommented, Apr 16, 2019

That’s fine if this is modified in your proposed version and works. I just edited the original version for my features.

The key is including this (or a modified version) within the loop:

inits = ds['initialization'].values
ctrl_inits = control.isel({dim: slice(0, -lag)})[dim].values
inits = intersection(inits, ctrl_inits)

This ensures that we use every data point possible for persistence. Does this updated version make sense? Previously, you trimmed off nlags from the control. That’s fine for something with 3000 data points. But for FOSI, we only have 61 at annual resolution. So this makes sure that at lag 1, it uses 60, lag 2 59, and so on. Rather than just 50 at all lags.

0reactions

bradyrxcommented, Apr 16, 2019

@aaronspring, yes that’s fine.

Top Results From Across the Web

A roadmap for the computation of persistent homology

Persistent homology (PH) is a method used in topological data analysis (TDA) ... implementations are being updated and released at a rapid pace....

A Comparative Study of Machine Learning Methods for ... - NCBI

Persistence landscapes are a mapping of persistence diagrams into a function space that is either a Banach space or Hilbert space (Bubenik, 2020) ......

Compute Engine release notes - Documentation - Google Cloud

This page contains the latest release notes for features and updates to the Compute Engine service. For older release notes, see the archive....

Persistence (computer science) - Wikipedia

In computer science, persistence refers to the characteristic of state of a system that outlives (persists more than) the process that created it....

EntityManager (Java(TM) EE 7 Specification APIs)

The EntityManager API is used to create and remove persistent entity instances, to find entities by their primary key, and to query over...