Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using different Regression models is not working properly

See original GitHub issue

Hey, when I’m running the default pipeline of auto_ml I get errors along the way. It seems that only a certain subset of models can be executed in the same pipeline run. For example model_names = ['SGDRegressor', 'LGBMRegressor', 'XGBRegressor'] throws an error while model_names = ['LGBMRegressor', 'XGBRegressor'] and model_names = ['SGDRegressor'] work fine.

Additionally the compare_all_models=True parameter combines models that are not working together and thus throws an error as well.

This is the script I’m running:

from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset

df_train, df_test = get_boston_dataset()

column_descriptions = {
    'MEDV': 'output'
    , 'CHAS': 'categorical'

ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)

ml_predictor.train(df_train, compare_all_models=True)

ml_predictor.score(df_test, df_test.MEDV)

And this is the error I get. The error seems to be the same for every “not working” combination of models.

Welcome to auto_ml! We're about to go through and make sense of your data using machine learning, and give you a production-ready pipeline to get predictions with.

If you have any issues, or new feature ideas, let us know at
You are running on version 2.9.10
Now using the model training_params that you passed in:
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'learning_rate': 0.1, 'warm_start': True}
Running basic data cleaning
Performing feature scaling
Fitting DataFrameVectorizer
Now using the model training_params that you passed in:
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'learning_rate': 0.1, 'warm_start': True}

About to run GridSearchCV on the pipeline for several models to predict MEDV
Fitting 2 folds for each of 6 candidates, totalling 12 fits
AttributeError                            Traceback (most recent call last)
<ipython-input-46-da4fdd716f78> in <module>()
     11 ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)
---> 13 ml_predictor.train(df_train, compare_all_models=True)
     15 ml_predictor.score(df_test, df_test.MEDV)

~\Anaconda3\envs\IMI-Devel\lib\site-packages\auto_ml\ in train(***failed resolving arguments***)
    669         # This is our main logic for how we train the final model
--> 670         self.trained_final_model = self.train_ml_estimator(self.model_names, self._scorer, X_df, y)
    672         if self.ensemble_config is not None and len(self.ensemble_config) > 0:

~\Anaconda3\envs\IMI-Devel\lib\site-packages\auto_ml\ in train_ml_estimator(self, estimator_names, scoring, X_df, y, feature_learning, prediction_interval)
   1247             self.grid_search_params = grid_search_params
-> 1249             gscv_results = self.fit_grid_search(X_df, y, grid_search_params, refit=True)
   1251             trained_final_model = gscv_results.best_estimator_

~\Anaconda3\envs\IMI-Devel\lib\site-packages\auto_ml\ in fit_grid_search(self, X_df, y, gs_params, feature_learning, refit)
   1192                 # Note that we will only report analytics results on the final model that ultimately gets selected, and trained on the entire dataset
-> 1194, y)
   1196         if self.verbose:

~\Anaconda3\envs\IMI-Devel\lib\site-packages\sklearn\model_selection\ in fit(self, X, y, groups, **fit_params)
    637                                   error_score=self.error_score)
    638           for parameters, (train, test) in product(candidate_params,
--> 639                                                    cv.split(X, y, groups)))
    641         # if one choose to see train score, "out" will contain train score info

~\Anaconda3\envs\IMI-Devel\lib\site-packages\sklearn\externals\joblib\ in __call__(self, iterable)
    787                 # consumption.
    788                 self._iterating = False
--> 789             self.retrieve()
    790             # Make sure that we get a last message telling us we are done
    791             elapsed_time = time.time() - self._start_time

~\Anaconda3\envs\IMI-Devel\lib\site-packages\sklearn\externals\joblib\ in retrieve(self)
    697             try:
    698                 if getattr(self._backend, 'supports_timeout', False):
--> 699                     self._output.extend(job.get(timeout=self.timeout))
    700                 else:
    701                     self._output.extend(job.get())

~\Anaconda3\envs\IMI-Devel\lib\multiprocessing\ in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    646     def _set(self, i, obj):

~\Anaconda3\envs\IMI-Devel\lib\multiprocessing\ in _handle_tasks(taskqueue, put, outqueue, pool, cache)
    422                         break
    423                     try:
--> 424                         put(task)
    425                     except Exception as e:
    426                         job, idx = task[:2]

~\Anaconda3\envs\IMI-Devel\lib\site-packages\sklearn\externals\joblib\ in send(obj)
    369             def send(obj):
    370                 buffer = BytesIO()
--> 371                 CustomizablePickler(buffer, self._reducers).dump(obj)
    372                 self._writer.send_bytes(buffer.getvalue())
    373             self._send = send

~\Anaconda3\envs\IMI-Devel\lib\site-packages\auto_ml\ in _pickle_method(m)
     47 # For handling parallelism edge cases
     48 def _pickle_method(m):
---> 49     if m.im_self is None:
     50         return getattr, (m.im_class, m.im_func.func_name)
     51     else:

AttributeError: 'function' object has no attribute 'im_self'

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:7

github_iconTop GitHub Comments

above-c-levelcommented, Jun 4, 2018

I barely have any grasp on how to use github and don’t know how to do a pull request, but the issue is that the file contains a function called _pickle_method which uses Python 2 function attributes. The issue is that all of us are trying to run Python 3.

If you go to auto_ml\ and edit the function to this, it should work.

def _pickle_method(m):
        if m.__self__ is None:
            return getattr, (m.__self__.__class__, m.__func__.__name__)
            return getattr, (m.__self__, m.__func__.__name__)

At least, I think that should fix it. No promises.

above-c-levelcommented, Dec 5, 2018

Hey there! It appears that this repository is no longer being actively maintained. However, I still think it’s a great idea, so I’ve started working on my own version of it, which you can find here. It’s worth pointing out that I’ve dropped Python 2.7 support, so you’ll have to upgrade if you haven’t already.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Five Regression Analysis Tips to Avoid Common Problems
Regression analysis is powerful but presents various pitfalls. Learn five tips that help you avoid common problems and make the modeling process easier....
Read more >
What if the Regression Equation Contains "Wrong" Predictors?
If the residuals suggest problems with the model, try a different functional form of the predictors or remove some of the interaction terms....
Read more >
How to compare regression models - Duke People
There is a separate logistic regression version with highly interactive tables and ... It's a toy (a clumsy one at that), not a...
Read more >
How to Tackle Your Next Regression Problem | by Tom Allport
In order to produce a well fitting regression model for your data there are several assumptions you need to check, as well as...
Read more >
A Refresher on Regression Analysis - Harvard Business Review
A note about “correlation is not causation”: Whenever you work with regression analysis or any other analysis that tries to explain the ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Post

No results found

github_iconTop Related Hashnode Post

No results found