Feature Request: KitchenSink classification (Fits every classification algorithm in a single line)
See original GitHub issueDescription
Example: Machine Learning practioners want to try every single classification algorithm for a dataset, how about having a modularized module which is an additional feature of GridSearchCV that now takes MULTIPLE classification algorithms with multiple parameters for each classification algorithm and shoots out the model that best classifies the data based on a chosen scorer
Steps/Code to Reproduce
Example:
hyperparametertuning.py
(C) 2017 by Abhishek Babuji <abhishekb2209@gmail.com>
Contains methods to return a pipeline object and a dictionary containing
classifier parameters
"""
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
class HyperParameterTuning:
"""
Contains methods to return a pipeline object and a dictionary containing
classifier parameters
"""
def __init__(self, classifier, vectorizer):
"""
Args:
classifier (One of 6 sklearn classifier objects): 'logreg', 'svm', 'nb',
'knn', 'xgboost', 'randomforests'
vectorizer (CountVectorizer or TfidfVectorizer): Type of vector space model
Returns:
pipeline (sklearn pipeline object): Returns a pipeline object which is used
by GridSearchCV
model_params[self.classifier] (dict): Returns a dictionary of parameters
for the specified type of classifier
"""
self.classifier = classifier
self.vectorizer = vectorizer
def get_pipeline(self):
"""
Args:
classifier (One of 6 sklearn classifier objects): 'logreg', 'svm', 'nb',
'knn', 'xgboost', 'randomforests'
vectorizer (CountVectorizer or TfidfVectorizer): Type of vector space model
Returns:
pipeline (sklearn pipeline object): Returns a pipeline object which is
used by GridSearchCV
model_params[self.classifier] (dict): Returns a dictionary of parameters
for the specified type of classifier
"""
classifier_objects = {'logreg': LogisticRegression(),
'svm': SVC(),
'knn': KNeighborsClassifier(),
'xgboost': GradientBoostingClassifier(),
'randomforests': RandomForestClassifier(),
'nb': MultinomialNB()}
pipeline = Pipeline([('vect', self.vectorizer),
('clf', classifier_objects[self.classifier])])
return pipeline
def get_params(self):
"""
Args:
self
Returns:
model_params[self.classifier] (dict): Returns a dictionary of parameters for the
specified type of classifier
"""
model_params = {'logreg': {'clf__C': (1, 10, 100), 'clf__penalty': ('l1', 'l2')},
'svm': {'clf__C': (1, 10, 100),
'clf__kernel': ('linear', 'poly', 'rbf', 'sigmoid')},
'knn': {'clf__n_neighbors': (5, 10, 50, 100)},
'xgboost': {'clf__n_estimators': (100, 500, 1000)},
'randomforests': {'clf__n_estimators': (100, 500, 1000)},
'nb': {'clf__alpha': (0, 1), 'clf__fit_prior': (True, False)}}
return model_params[self.classifier]
An example like this, that tries EVERY classifier. Of course this is very computationally and time expensive… But if you’re trying to fit it for small datasets where Machine Learning is applicable, we could brainstorm further to maybe understand which default HyperParameters to tune for each algorithm using a scorer of your choice
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Extending Linear Classifiers with Nonlinear Features (Lecture 8)
This next figure shows a one-dimensional classification problem which also has no linear solution, since the class is negative if x is below...
Read more >Random Kitchen Sinks: Replacing Optimization ... - YouTube
A popular trend in computer vision, graphics, and machine learning is to replace sophisticated statistical models with simpler generic ones, ...
Read more >A Low Cost Implementation of Multi-label Classification ...
Use of Random Kitchen Sink algorithm improves the accuracy of Multi-label classification and brings improvement in terms of memory usage for large dataset....
Read more >Feature extraction and classification algorithm, which one is ...
Gaussian models finish the fitting process of the signal waveform by adjusting the model parameters. The parametric features are usually using the model ......
Read more >Weighted Sums of Random Kitchen Sinks - People @EECS
Algorithm 1 is appealing for several reasons. First, it can be implemented in a few lines of MATLAB code even for complex feature...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey. There’s auto-sklearn that does that (somewhat smarter).
Some of their recent work is described here: https://ml.informatik.uni-freiburg.de/papers/18-AUTOML-AutoChalleng
I’ll probably soon post a simple implementation of that, but outside of sklearn. I think the space is still too volatile to do this in sklearn.
@amueller Absolutely. Feel free to close it. I am going to make it my serious hobby to contribute to sklearn. I write potato code though. So I have some ways to go!