Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using different Regression models is not working properly

See original GitHub issue

Hey, when I’m running the default pipeline of auto_ml I get errors along the way. It seems that only a certain subset of models can be executed in the same pipeline run. For example model_names = ['SGDRegressor', 'LGBMRegressor', 'XGBRegressor'] throws an error while model_names = ['LGBMRegressor', 'XGBRegressor'] and model_names = ['SGDRegressor'] work fine.

Additionally the compare_all_models=True parameter combines models that are not working together and thus throws an error as well.

This is the script I’m running:

from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset

df_train, df_test = get_boston_dataset()

column_descriptions = {
    'MEDV': 'output'
    , 'CHAS': 'categorical'
}

ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)

ml_predictor.train(df_train, compare_all_models=True)

ml_predictor.score(df_test, df_test.MEDV)

And this is the error I get. The error seems to be the same for every “not working” combination of models.

Welcome to auto_ml! We're about to go through and make sense of your data using machine learning, and give you a production-ready pipeline to get predictions with.

If you have any issues, or new feature ideas, let us know at http://auto.ml
You are running on version 2.9.10
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'learning_rate': 0.1, 'warm_start': True}
Running basic data cleaning
Performing feature scaling
Fitting DataFrameVectorizer
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'learning_rate': 0.1, 'warm_start': True}


********************************************************************************************
About to run GridSearchCV on the pipeline for several models to predict MEDV
Fitting 2 folds for each of 6 candidates, totalling 12 fits
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-46-da4fdd716f78> in <module>()
     11 ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)
     12 
---> 13 ml_predictor.train(df_train, compare_all_models=True)
     14 
     15 ml_predictor.score(df_test, df_test.MEDV)

~\Anaconda3\envs\IMI-Devel\lib\site-packages\auto_ml\predictor.py in train(***failed resolving arguments***)
    668 
    669         # This is our main logic for how we train the final model
--> 670         self.trained_final_model = self.train_ml_estimator(self.model_names, self._scorer, X_df, y)
    671 
    672         if self.ensemble_config is not None and len(self.ensemble_config) > 0:

~\Anaconda3\envs\IMI-Devel\lib\site-packages\auto_ml\predictor.py in train_ml_estimator(self, estimator_names, scoring, X_df, y, feature_learning, prediction_interval)
   1247             self.grid_search_params = grid_search_params
   1248 
-> 1249             gscv_results = self.fit_grid_search(X_df, y, grid_search_params, refit=True)
   1250 
   1251             trained_final_model = gscv_results.best_estimator_

~\Anaconda3\envs\IMI-Devel\lib\site-packages\auto_ml\predictor.py in fit_grid_search(self, X_df, y, gs_params, feature_learning, refit)
   1192                 # Note that we will only report analytics results on the final model that ultimately gets selected, and trained on the entire dataset
   1193 
-> 1194         gs.fit(X_df, y)
   1195 
   1196         if self.verbose:

~\Anaconda3\envs\IMI-Devel\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
    637                                   error_score=self.error_score)
    638           for parameters, (train, test) in product(candidate_params,
--> 639                                                    cv.split(X, y, groups)))
    640 
    641         # if one choose to see train score, "out" will contain train score info

~\Anaconda3\envs\IMI-Devel\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self, iterable)
    787                 # consumption.
    788                 self._iterating = False
--> 789             self.retrieve()
    790             # Make sure that we get a last message telling us we are done
    791             elapsed_time = time.time() - self._start_time

~\Anaconda3\envs\IMI-Devel\lib\site-packages\sklearn\externals\joblib\parallel.py in retrieve(self)
    697             try:
    698                 if getattr(self._backend, 'supports_timeout', False):
--> 699                     self._output.extend(job.get(timeout=self.timeout))
    700                 else:
    701                     self._output.extend(job.get())

~\Anaconda3\envs\IMI-Devel\lib\multiprocessing\pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

~\Anaconda3\envs\IMI-Devel\lib\multiprocessing\pool.py in _handle_tasks(taskqueue, put, outqueue, pool, cache)
    422                         break
    423                     try:
--> 424                         put(task)
    425                     except Exception as e:
    426                         job, idx = task[:2]

~\Anaconda3\envs\IMI-Devel\lib\site-packages\sklearn\externals\joblib\pool.py in send(obj)
    369             def send(obj):
    370                 buffer = BytesIO()
--> 371                 CustomizablePickler(buffer, self._reducers).dump(obj)
    372                 self._writer.send_bytes(buffer.getvalue())
    373             self._send = send

~\Anaconda3\envs\IMI-Devel\lib\site-packages\auto_ml\predictor.py in _pickle_method(m)
     47 # For handling parallelism edge cases
     48 def _pickle_method(m):
---> 49     if m.im_self is None:
     50         return getattr, (m.im_class, m.im_func.func_name)
     51     else:

AttributeError: 'function' object has no attribute 'im_self'

Issue Analytics

State:
Created 5 years ago
Comments:7

Top GitHub Comments

1reaction

above-c-levelcommented, Jun 4, 2018

I barely have any grasp on how to use github and don’t know how to do a pull request, but the issue is that the file predictor.py contains a function called _pickle_method which uses Python 2 function attributes. The issue is that all of us are trying to run Python 3.

If you go to auto_ml\predictor.py and edit the function to this, it should work.

def _pickle_method(m):
        if m.__self__ is None:
            return getattr, (m.__self__.__class__, m.__func__.__name__)
        else:
            return getattr, (m.__self__, m.__func__.__name__)

At least, I think that should fix it. No promises.

0reactions

above-c-levelcommented, Dec 5, 2018

Hey there! It appears that this repository is no longer being actively maintained. However, I still think it’s a great idea, so I’ve started working on my own version of it, which you can find here. It’s worth pointing out that I’ve dropped Python 2.7 support, so you’ll have to upgrade if you haven’t already.

Top Results From Across the Web

Five Regression Analysis Tips to Avoid Common Problems

Regression analysis is powerful but presents various pitfalls. Learn five tips that help you avoid common problems and make the modeling process easier....

What if the Regression Equation Contains "Wrong" Predictors?

If the residuals suggest problems with the model, try a different functional form of the predictors or remove some of the interaction terms....

How to compare regression models - Duke People

There is a separate logistic regression version with highly interactive tables and ... It's a toy (a clumsy one at that), not a...

How to Tackle Your Next Regression Problem | by Tom Allport

In order to produce a well fitting regression model for your data there are several assumptions you need to check, as well as...

A Refresher on Regression Analysis - Harvard Business Review

A note about “correlation is not causation”: Whenever you work with regression analysis or any other analysis that tries to explain the ...