Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Use `KernSeg` with model selection as described in JMLR paper

See original GitHub issue

Sylvain Arlot, Alain Celisse and Zaid Harchaoui provide theory and a heuristic for model selection with KernSeg in their paper A Kernel Multiple Change-point Algorithm via Model Selection. See 3.3.2, Theorem 2 and appendix B.3.

The penalty they propose does not scale linearly with the number of change points, so sadly it is incompatible with the current implementation. Furthermore the heuristic they propose requires knowledge of the respective losses for a set of possible numbers of split points, which currently (to my best understanding) cannot be recovered without expensive refits.

It would be great if this could be added.

Issue Analytics

State:
Created 2 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

2reactions

deepcharlescommented, Dec 16, 2021

If you are up for it, that would be great. The idea would be to do a didactic example of this heuristics (in the spirit of this example but with simulated data).

You only need to create a Jupyter notebook in docs/examples and we’ll take care of the integration to the docs.

0reactions

mlondschiencommented, Dec 14, 2021

Thanks @deepcharles and @oboulant for the input and thank you for working on ruptures.

I believe the following code implements the heuristic defined in the JMLR paper:

import numpy as np
import ruptures as rpt
from scipy.special import betaln
from sklearn.linear_model import LinearRegression

def kernseg(X, n_bkps_max):
    algo = rpt.KernelCPD(kernel="rbf")

    algo.fit(X).predict(n_bkps=n_bkps_max)

    segmentations_values = [[len(X)]] + list(algo.segmentations_dict.values())
    costs = [algo.cost.sum_of_costs(est) for est in segmentations_values]

    # https://stackoverflow.com/questions/21767690/python-log-n-choose-k
    log_nchoosek = [
        -betaln(1 + n_bkps_max - k, 1 + k) - np.log(n_bkps_max + 1)
        for k in range(0, n_bkps_max + 1)
    ]
    X_lm = np.array([log_nchoosek, range(0, n_bkps_max + 1)]).T
    lm = LinearRegression().fit(
        X_lm[int(0.6 * n_bkps_max) :, :], costs[int(0.6 * n_bkps_max) :]
    )
    adjusted_costs = costs - 2 * X_lm.dot(lm.coef_)

    return [0] + segmentations_values[np.argmin(adjusted_costs)]

Let me know if you would like to add this to the existing package. I could then open a PR.

Top Results From Across the Web

Use of the Zero-Norm with Linear Models and Kernel Methods

Applications we investigate which aid our discussion include variable and feature selection on biological microarray data, and multicategory classification.

SimpleMKL - Journal of Machine Learning Research

In this paper, we address the MKL problem through a weighted 2-norm regularization formulation with an addi- tional constraint on the weights that...

JMLR Volume 12 - Journal of Machine Learning Research

Bayesian Generalized Kernel Mixed Models: Zhihua Zhang, Guang Dai, ... Online Learning in Case of Unbounded Losses Using Follow the Perturbed Leader ...

JMLR Volume 22 - Journal of Machine Learning Research

On the Optimality of Kernel-Embedding Based Goodness-of-Fit Tests: Krishnakumar ... A Unified Sample Selection Framework for Output Noise Filtering: An ...

JMLR Volume 20 - Journal of Machine Learning Research

Using Simulation to Improve Sample-Efficiency of Bayesian Optimization for Bipedal Robots ... A Kernel Multiple Change-point Algorithm via Model Selection.