Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add Sampling Strategy to DummyRegressor

See original GitHub issue

Describe the workflow you want to enable

To me, the best dummy regressor is to just sample labels randomly. If you can’t beat that benchmark, it’s time to go home.

This is why I’ve added it to scikit-lego, an scikit-learn compatible library that I maintain. A year after I made the tool I got this message from @amueller suggesting that it might be something I can merge back to scikit-learn.

The goal of this issue is to confirm if this is a welcome change, after which I’d be more than happy to implement it.

Describe your proposed solution

I want to add the uniform strategy to the DummyRegressor and the DummyClassifier to uniformly sample the labels. This may require the signature of the object to change though because you’d want to have a random seed in such scenarios.

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

lorentzenchrcommented, Sep 5, 2022

I’m trying to conclude this issue. First, the proposed feature is available in scikit-lego, https://scikit-lego.netlify.app/api/dummy.html (@koaning, thanks for that packages btw). Second, a constant prediction (mean or quantile) is usually a better reference model than pure random predictions which could be arbitrarily bad (in terms of loss/score). Third, for regression tasks there are just so many possible distributions to draw from. Which one should we offer.

All in all, I’m -1 and therefore close the issue.

0reactions

koaningcommented, Jul 10, 2020

@thomasjpfan they indeed solve the same problem, to me though, the statement that a model hasn’t been able to beat randomness should make more people worried about using it than any other heuristic. If you haven’t beaten the “mean” strategy than this is also certainly reason for concern but it is less strong than than if you aren’t able to beat entropy.

I do wonder if uniform is the best strategy to show randomness though. I might argue that stratisfied might be good to add for the regressor. I also realise that this is already implemented for the DummyClassifier.

Top Results From Across the Web

sklearn.dummy.DummyRegressor

DummyRegressor : Poisson regression and non-normal loss Poisson regression and ... The quantile to predict using the “quantile” strategy. ... Sample weights.

Python sklearn.dummy.DummyRegressor() Examples

This page shows Python examples of sklearn.dummy.DummyRegressor.

DummyRegressor Baseline for Time Series - Kaggle

Explore and run machine learning code with Kaggle Notebooks | Using data from Store Item Demand Forecasting Challenge.

Comparing with a dummy regressor | Python Data Analysis ...

The scikit-learn DummyRegressor class implements several strategies for random ... The strategies are as follo. ... You're currently viewing a free sample.

Dummy Regressor - GeeksforGeeks

The Dummy Regressor is a kind of Regressor that gives prediction based on simple strategies without paying any attention to the input Data....