Sklearn pipeline and cross_val_score don't work for some transformers
See original GitHub issueHi,
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingClassifier
from dask_ml.decomposition import PCA
from dask_ml.wrappers import ParallelPostFit
from dask_ml.preprocessing import StandardScaler
clf = ParallelPostFit(estimator=GradientBoostingClassifier(), scoring='accuracy')
pipe = make_pipeline(PCA(),clf)
pipe = make_pipeline(StandardScaler(),clf)
mysc = cross_val_score(pipe, dataset.iloc[:,:-1], dataset.iloc[:,-1])
Work for pipe = make_pipeline(StandardScaler(),clf) but not for pipe = make_pipeline(PCA(),clf) Got the error: AttributeError: ‘DataFrame’ object has no attribute ‘chunks’
If I use RobustScaler() instead of StandardScaler(): AttributeError: ‘int’ object has no attribute ‘ndim’
How can I fix this problem?
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
It's possible to apply transform operations only to the training ...
I am trying to include steps in a pipeline that transform the data, ex. balancing the dataset. This pipeline is intended to be...
Read more >Python Pipeline Error When Used in Cross Validation
I have created a pipeline that performs some pre-processing tasks on data. It works. Includes some customer transformers. Here is the code. from ......
Read more >10. Common pitfalls and recommended practices - Scikit-learn
The pipeline is ideal for use in cross-validation and hyper-parameter tuning functions. 10.3. Controlling randomness¶. Some scikit-learn objects are inherently ...
Read more >How To Write Clean And Scalable Code With Custom ...
Pipeline is a scalable framework used in scikit-learn package. ... we will create 6 custom transformers in order for the pipeline to work....
Read more >Scikit-learn pipelines and pandas - Kaggle
There are several things I don't like with this approach and want to raise ... Luckily, all we need to do is to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
You should pass ndarrays to ParallelPostFit.fit if that’s what the estimator (SVC in this case) is expecting.
On Sat, Nov 2, 2019 at 9:13 AM magehex notifications@github.com wrote:
@TomAugspurger Yes, I know that. This is inconvenient. But it seems there is also an other bug in ParallelPostFit:
Get error: TypeError: unhashable type: ‘Array’
If I use GradientBoostingClassifier() it works