Add Sampling Strategy to DummyRegressor
See original GitHub issueDescribe the workflow you want to enable
To me, the best dummy regressor is to just sample labels randomly. If you can’t beat that benchmark, it’s time to go home.
This is why I’ve added it to scikit-lego, an scikit-learn compatible library that I maintain. A year after I made the tool I got this message from @amueller suggesting that it might be something I can merge back to scikit-learn.
The goal of this issue is to confirm if this is a welcome change, after which I’d be more than happy to implement it.
Describe your proposed solution
I want to add the uniform
strategy to the DummyRegressor
and the DummyClassifier
to uniformly sample the labels. This may require the signature of the object to change though because you’d want to have a random seed in such scenarios.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:6 (4 by maintainers)
Top Results From Across the Web
sklearn.dummy.DummyRegressor
DummyRegressor : Poisson regression and non-normal loss Poisson regression and ... The quantile to predict using the “quantile” strategy. ... Sample weights.
Read more >Python sklearn.dummy.DummyRegressor() Examples
This page shows Python examples of sklearn.dummy.DummyRegressor.
Read more >DummyRegressor Baseline for Time Series - Kaggle
Explore and run machine learning code with Kaggle Notebooks | Using data from Store Item Demand Forecasting Challenge.
Read more >Comparing with a dummy regressor | Python Data Analysis ...
The scikit-learn DummyRegressor class implements several strategies for random ... The strategies are as follo. ... You're currently viewing a free sample.
Read more >Dummy Regressor - GeeksforGeeks
The Dummy Regressor is a kind of Regressor that gives prediction based on simple strategies without paying any attention to the input Data....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m trying to conclude this issue. First, the proposed feature is available in scikit-lego, https://scikit-lego.netlify.app/api/dummy.html (@koaning, thanks for that packages btw). Second, a constant prediction (mean or quantile) is usually a better reference model than pure random predictions which could be arbitrarily bad (in terms of loss/score). Third, for regression tasks there are just so many possible distributions to draw from. Which one should we offer.
All in all, I’m -1 and therefore close the issue.
@thomasjpfan they indeed solve the same problem, to me though, the statement that a model hasn’t been able to beat randomness should make more people worried about using it than any other heuristic. If you haven’t beaten the “mean” strategy than this is also certainly reason for concern but it is less strong than than if you aren’t able to beat entropy.
I do wonder if
uniform
is the best strategy to show randomness though. I might argue thatstratisfied
might be good to add for the regressor. I also realise that this is already implemented for theDummyClassifier
.