question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Use `KernSeg` with model selection as described in JMLR paper

See original GitHub issue

Sylvain Arlot, Alain Celisse and Zaid Harchaoui provide theory and a heuristic for model selection with KernSeg in their paper A Kernel Multiple Change-point Algorithm via Model Selection. See 3.3.2, Theorem 2 and appendix B.3.

The penalty they propose does not scale linearly with the number of change points, so sadly it is incompatible with the current implementation. Furthermore the heuristic they propose requires knowledge of the respective losses for a set of possible numbers of split points, which currently (to my best understanding) cannot be recovered without expensive refits.

It would be great if this could be added.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
deepcharlescommented, Dec 16, 2021

If you are up for it, that would be great. The idea would be to do a didactic example of this heuristics (in the spirit of this example but with simulated data).

You only need to create a Jupyter notebook in docs/examples and we’ll take care of the integration to the docs.

0reactions
mlondschiencommented, Dec 14, 2021

Thanks @deepcharles and @oboulant for the input and thank you for working on ruptures.

I believe the following code implements the heuristic defined in the JMLR paper:

import numpy as np
import ruptures as rpt
from scipy.special import betaln
from sklearn.linear_model import LinearRegression

def kernseg(X, n_bkps_max):
    algo = rpt.KernelCPD(kernel="rbf")

    algo.fit(X).predict(n_bkps=n_bkps_max)

    segmentations_values = [[len(X)]] + list(algo.segmentations_dict.values())
    costs = [algo.cost.sum_of_costs(est) for est in segmentations_values]

    # https://stackoverflow.com/questions/21767690/python-log-n-choose-k
    log_nchoosek = [
        -betaln(1 + n_bkps_max - k, 1 + k) - np.log(n_bkps_max + 1)
        for k in range(0, n_bkps_max + 1)
    ]
    X_lm = np.array([log_nchoosek, range(0, n_bkps_max + 1)]).T
    lm = LinearRegression().fit(
        X_lm[int(0.6 * n_bkps_max) :, :], costs[int(0.6 * n_bkps_max) :]
    )
    adjusted_costs = costs - 2 * X_lm.dot(lm.coef_)

    return [0] + segmentations_values[np.argmin(adjusted_costs)]

Let me know if you would like to add this to the existing package. I could then open a PR.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Use of the Zero-Norm with Linear Models and Kernel Methods
Applications we investigate which aid our discussion include variable and feature selection on biological microarray data, and multicategory classification.
Read more >
SimpleMKL - Journal of Machine Learning Research
In this paper, we address the MKL problem through a weighted 2-norm regularization formulation with an addi- tional constraint on the weights that...
Read more >
JMLR Volume 12 - Journal of Machine Learning Research
Bayesian Generalized Kernel Mixed Models: Zhihua Zhang, Guang Dai, ... Online Learning in Case of Unbounded Losses Using Follow the Perturbed Leader ...
Read more >
JMLR Volume 22 - Journal of Machine Learning Research
On the Optimality of Kernel-Embedding Based Goodness-of-Fit Tests: Krishnakumar ... A Unified Sample Selection Framework for Output Noise Filtering: An ...
Read more >
JMLR Volume 20 - Journal of Machine Learning Research
Using Simulation to Improve Sample-Efficiency of Bayesian Optimization for Bipedal Robots ... A Kernel Multiple Change-point Algorithm via Model Selection.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found