question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add Sampling Strategy to DummyRegressor

See original GitHub issue

Describe the workflow you want to enable

To me, the best dummy regressor is to just sample labels randomly. If you can’t beat that benchmark, it’s time to go home.

This is why I’ve added it to scikit-lego, an scikit-learn compatible library that I maintain. A year after I made the tool I got this message from @amueller suggesting that it might be something I can merge back to scikit-learn.

The goal of this issue is to confirm if this is a welcome change, after which I’d be more than happy to implement it.

Describe your proposed solution

I want to add the uniform strategy to the DummyRegressor and the DummyClassifier to uniformly sample the labels. This may require the signature of the object to change though because you’d want to have a random seed in such scenarios.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
lorentzenchrcommented, Sep 5, 2022

I’m trying to conclude this issue. First, the proposed feature is available in scikit-lego, https://scikit-lego.netlify.app/api/dummy.html (@koaning, thanks for that packages btw). Second, a constant prediction (mean or quantile) is usually a better reference model than pure random predictions which could be arbitrarily bad (in terms of loss/score). Third, for regression tasks there are just so many possible distributions to draw from. Which one should we offer.

All in all, I’m -1 and therefore close the issue.

0reactions
koaningcommented, Jul 10, 2020

@thomasjpfan they indeed solve the same problem, to me though, the statement that a model hasn’t been able to beat randomness should make more people worried about using it than any other heuristic. If you haven’t beaten the “mean” strategy than this is also certainly reason for concern but it is less strong than than if you aren’t able to beat entropy.

I do wonder if uniform is the best strategy to show randomness though. I might argue that stratisfied might be good to add for the regressor. I also realise that this is already implemented for the DummyClassifier.

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.dummy.DummyRegressor
DummyRegressor : Poisson regression and non-normal loss Poisson regression and ... The quantile to predict using the “quantile” strategy. ... Sample weights.
Read more >
Python sklearn.dummy.DummyRegressor() Examples
This page shows Python examples of sklearn.dummy.DummyRegressor.
Read more >
DummyRegressor Baseline for Time Series - Kaggle
Explore and run machine learning code with Kaggle Notebooks | Using data from Store Item Demand Forecasting Challenge.
Read more >
Comparing with a dummy regressor | Python Data Analysis ...
The scikit-learn DummyRegressor class implements several strategies for random ... The strategies are as follo. ... You're currently viewing a free sample.
Read more >
Dummy Regressor - GeeksforGeeks
The Dummy Regressor is a kind of Regressor that gives prediction based on simple strategies without paying any attention to the input Data....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found