question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature Request: KitchenSink classification (Fits every classification algorithm in a single line)

See original GitHub issue

Description

Example: Machine Learning practioners want to try every single classification algorithm for a dataset, how about having a modularized module which is an additional feature of GridSearchCV that now takes MULTIPLE classification algorithms with multiple parameters for each classification algorithm and shoots out the model that best classifies the data based on a chosen scorer

Steps/Code to Reproduce

Example:

hyperparametertuning.py

(C) 2017 by Abhishek Babuji <abhishekb2209@gmail.com>

Contains methods to return a pipeline object and a dictionary containing
classifier parameters
"""

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC


class HyperParameterTuning:
    """
    Contains methods to return a pipeline object and a dictionary containing
    classifier parameters
    """

    def __init__(self, classifier, vectorizer):
        """
        Args:
            classifier (One of 6 sklearn classifier objects): 'logreg', 'svm', 'nb',
                                                              'knn', 'xgboost', 'randomforests'
            vectorizer (CountVectorizer or TfidfVectorizer): Type of vector space model


        Returns:
            pipeline (sklearn pipeline object): Returns a pipeline object which is used
                                                by GridSearchCV
            model_params[self.classifier] (dict): Returns a dictionary of parameters
                                                  for the specified type of classifier

        """

        self.classifier = classifier
        self.vectorizer = vectorizer

    def get_pipeline(self):
        """
        Args:

            classifier (One of 6 sklearn classifier objects): 'logreg', 'svm', 'nb',
                                                              'knn', 'xgboost', 'randomforests'
            vectorizer (CountVectorizer or TfidfVectorizer): Type of vector space model


        Returns:
            pipeline (sklearn pipeline object): Returns a pipeline object which is
                                                used by GridSearchCV
            model_params[self.classifier] (dict): Returns a dictionary of parameters
                                                  for the specified type of classifier
        """

        classifier_objects = {'logreg': LogisticRegression(),
                              'svm': SVC(),
                              'knn': KNeighborsClassifier(),
                              'xgboost': GradientBoostingClassifier(),
                              'randomforests': RandomForestClassifier(),
                              'nb': MultinomialNB()}
        pipeline = Pipeline([('vect', self.vectorizer),
                             ('clf', classifier_objects[self.classifier])])

        return pipeline

    def get_params(self):
        """
        Args:
            self


        Returns:
            model_params[self.classifier] (dict): Returns a dictionary of parameters for the
                                                  specified type of classifier

        """
        model_params = {'logreg': {'clf__C': (1, 10, 100), 'clf__penalty': ('l1', 'l2')},
                        'svm': {'clf__C': (1, 10, 100),
                                'clf__kernel': ('linear', 'poly', 'rbf', 'sigmoid')},
                        'knn': {'clf__n_neighbors': (5, 10, 50, 100)},
                        'xgboost': {'clf__n_estimators': (100, 500, 1000)},
                        'randomforests': {'clf__n_estimators': (100, 500, 1000)},
                        'nb': {'clf__alpha': (0, 1), 'clf__fit_prior': (True, False)}}
        return model_params[self.classifier]

An example like this, that tries EVERY classifier. Of course this is very computationally and time expensive… But if you’re trying to fit it for small datasets where Machine Learning is applicable, we could brainstorm further to maybe understand which default HyperParameters to tune for each algorithm using a scorer of your choice

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
amuellercommented, Oct 5, 2018

Hey. There’s auto-sklearn that does that (somewhat smarter).

Some of their recent work is described here: https://ml.informatik.uni-freiburg.de/papers/18-AUTOML-AutoChalleng

I’ll probably soon post a simple implementation of that, but outside of sklearn. I think the space is still too volatile to do this in sklearn.

1reaction
AbhishekBabujicommented, Oct 5, 2018

@amueller Absolutely. Feel free to close it. I am going to make it my serious hobby to contribute to sklearn. I write potato code though. So I have some ways to go!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Extending Linear Classifiers with Nonlinear Features (Lecture 8)
This next figure shows a one-dimensional classification problem which also has no linear solution, since the class is negative if x is below...
Read more >
Random Kitchen Sinks: Replacing Optimization ... - YouTube
A popular trend in computer vision, graphics, and machine learning is to replace sophisticated statistical models with simpler generic ones, ...
Read more >
A Low Cost Implementation of Multi-label Classification ...
Use of Random Kitchen Sink algorithm improves the accuracy of Multi-label classification and brings improvement in terms of memory usage for large dataset....
Read more >
Feature extraction and classification algorithm, which one is ...
Gaussian models finish the fitting process of the signal waveform by adjusting the model parameters. The parametric features are usually using the model ......
Read more >
Weighted Sums of Random Kitchen Sinks - People @EECS
Algorithm 1 is appealing for several reasons. First, it can be implemented in a few lines of MATLAB code even for complex feature...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found