question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LogisticRegression cannot train from Dask DataFrame

See original GitHub issue

A simple example:

from dask import dataframe as dd
from dask_glm.datasets import make_classification
from dask_ml.linear_model import LogisticRegression

X, y = make_classification(n_samples=10000, n_features=2)

X = dd.from_dask_array(X, columns=["a","b"])
y = dd.from_array(y)

lr = LogisticRegression()
lr.fit(X, y)

Returns KeyError: (<class 'dask.dataframe.core.DataFrame'>,)

I did not have time to try if it is also the case for other models.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:4
  • Comments:33 (15 by maintainers)

github_iconTop GitHub Comments

3reactions
TomAugspurgercommented, Nov 6, 2017

Thanks. At the moment the dask_glm based estimators just work with dask arrays, not dataframes. You can use .values to get the array.

I’m hoping to put in some helpers for handling all the extra DataFrame metadata sometime soon, so this will be more consistent across estimators.

0reactions
Abhishekdutt9commented, Sep 30, 2022

Use lr.fit(X.values, y.values) instead

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - dusk ml logisticregression throws this error ...
Right now LogisticRegression requires that the Dask Array passed to it has known chunk sizes (see Train_X_vect.chunks or .shape ).
Read more >
Dask - How to handle large dataframes in python using ...
But, as your data gets bigger, bigger than what you can fit in the RAM, pandas won't be sufficient. This is a very...
Read more >
Speeding up your Algorithms Part 4— Dask | by Puneet Grover
Run your Pandas/Numpy/Sklearn/Python code in parallel with Dask ... If your task is a little simple and you are not able to or...
Read more >
dask_ml.linear_model.LogisticRegression - Dask-ML
Esimator for logistic regression. Parameters. penaltystr or Regularizer, default 'l2'. Regularizer to use. Only relevant for the 'admm', ...
Read more >
Chapter 10: Machine learning with Dask-ML - Data Science ...
Just as we've seen how Dask DataFrames parallelize Pandas and Dask Arrays parallelize NumPy, Dask-ML is a parallel implementation of scikit-learn. Figure 10.2 ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found