Using different Regression models is not working properly
See original GitHub issueHey, when I’m running the default pipeline of auto_ml I get errors along the way. It seems that only a certain subset of models can be executed in the same pipeline run. For example model_names = ['SGDRegressor', 'LGBMRegressor', 'XGBRegressor']
throws an error while model_names = ['LGBMRegressor', 'XGBRegressor']
and model_names = ['SGDRegressor']
work fine.
Additionally the compare_all_models=True
parameter combines models that are not working together and thus throws an error as well.
This is the script I’m running:
from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset
df_train, df_test = get_boston_dataset()
column_descriptions = {
'MEDV': 'output'
, 'CHAS': 'categorical'
}
ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)
ml_predictor.train(df_train, compare_all_models=True)
ml_predictor.score(df_test, df_test.MEDV)
And this is the error I get. The error seems to be the same for every “not working” combination of models.
Welcome to auto_ml! We're about to go through and make sense of your data using machine learning, and give you a production-ready pipeline to get predictions with.
If you have any issues, or new feature ideas, let us know at http://auto.ml
You are running on version 2.9.10
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'learning_rate': 0.1, 'warm_start': True}
Running basic data cleaning
Performing feature scaling
Fitting DataFrameVectorizer
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'learning_rate': 0.1, 'warm_start': True}
********************************************************************************************
About to run GridSearchCV on the pipeline for several models to predict MEDV
Fitting 2 folds for each of 6 candidates, totalling 12 fits
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-46-da4fdd716f78> in <module>()
11 ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)
12
---> 13 ml_predictor.train(df_train, compare_all_models=True)
14
15 ml_predictor.score(df_test, df_test.MEDV)
~\Anaconda3\envs\IMI-Devel\lib\site-packages\auto_ml\predictor.py in train(***failed resolving arguments***)
668
669 # This is our main logic for how we train the final model
--> 670 self.trained_final_model = self.train_ml_estimator(self.model_names, self._scorer, X_df, y)
671
672 if self.ensemble_config is not None and len(self.ensemble_config) > 0:
~\Anaconda3\envs\IMI-Devel\lib\site-packages\auto_ml\predictor.py in train_ml_estimator(self, estimator_names, scoring, X_df, y, feature_learning, prediction_interval)
1247 self.grid_search_params = grid_search_params
1248
-> 1249 gscv_results = self.fit_grid_search(X_df, y, grid_search_params, refit=True)
1250
1251 trained_final_model = gscv_results.best_estimator_
~\Anaconda3\envs\IMI-Devel\lib\site-packages\auto_ml\predictor.py in fit_grid_search(self, X_df, y, gs_params, feature_learning, refit)
1192 # Note that we will only report analytics results on the final model that ultimately gets selected, and trained on the entire dataset
1193
-> 1194 gs.fit(X_df, y)
1195
1196 if self.verbose:
~\Anaconda3\envs\IMI-Devel\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
637 error_score=self.error_score)
638 for parameters, (train, test) in product(candidate_params,
--> 639 cv.split(X, y, groups)))
640
641 # if one choose to see train score, "out" will contain train score info
~\Anaconda3\envs\IMI-Devel\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self, iterable)
787 # consumption.
788 self._iterating = False
--> 789 self.retrieve()
790 # Make sure that we get a last message telling us we are done
791 elapsed_time = time.time() - self._start_time
~\Anaconda3\envs\IMI-Devel\lib\site-packages\sklearn\externals\joblib\parallel.py in retrieve(self)
697 try:
698 if getattr(self._backend, 'supports_timeout', False):
--> 699 self._output.extend(job.get(timeout=self.timeout))
700 else:
701 self._output.extend(job.get())
~\Anaconda3\envs\IMI-Devel\lib\multiprocessing\pool.py in get(self, timeout)
642 return self._value
643 else:
--> 644 raise self._value
645
646 def _set(self, i, obj):
~\Anaconda3\envs\IMI-Devel\lib\multiprocessing\pool.py in _handle_tasks(taskqueue, put, outqueue, pool, cache)
422 break
423 try:
--> 424 put(task)
425 except Exception as e:
426 job, idx = task[:2]
~\Anaconda3\envs\IMI-Devel\lib\site-packages\sklearn\externals\joblib\pool.py in send(obj)
369 def send(obj):
370 buffer = BytesIO()
--> 371 CustomizablePickler(buffer, self._reducers).dump(obj)
372 self._writer.send_bytes(buffer.getvalue())
373 self._send = send
~\Anaconda3\envs\IMI-Devel\lib\site-packages\auto_ml\predictor.py in _pickle_method(m)
47 # For handling parallelism edge cases
48 def _pickle_method(m):
---> 49 if m.im_self is None:
50 return getattr, (m.im_class, m.im_func.func_name)
51 else:
AttributeError: 'function' object has no attribute 'im_self'
Issue Analytics
- State:
- Created 5 years ago
- Comments:7
Top Results From Across the Web
Five Regression Analysis Tips to Avoid Common Problems
Regression analysis is powerful but presents various pitfalls. Learn five tips that help you avoid common problems and make the modeling process easier....
Read more >What if the Regression Equation Contains "Wrong" Predictors?
If the residuals suggest problems with the model, try a different functional form of the predictors or remove some of the interaction terms....
Read more >How to compare regression models - Duke People
There is a separate logistic regression version with highly interactive tables and ... It's a toy (a clumsy one at that), not a...
Read more >How to Tackle Your Next Regression Problem | by Tom Allport
In order to produce a well fitting regression model for your data there are several assumptions you need to check, as well as...
Read more >A Refresher on Regression Analysis - Harvard Business Review
A note about “correlation is not causation”: Whenever you work with regression analysis or any other analysis that tries to explain the ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I barely have any grasp on how to use github and don’t know how to do a pull request, but the issue is that the file predictor.py contains a function called _pickle_method which uses Python 2 function attributes. The issue is that all of us are trying to run Python 3.
If you go to auto_ml\predictor.py and edit the function to this, it should work.
At least, I think that should fix it. No promises.
Hey there! It appears that this repository is no longer being actively maintained. However, I still think it’s a great idea, so I’ve started working on my own version of it, which you can find here. It’s worth pointing out that I’ve dropped Python 2.7 support, so you’ll have to upgrade if you haven’t already.