Example using dask.array.learn on non-trivial data
See original GitHub issueWe built dask.array.learn
a long while ago https://github.com/blaze/dask/pull/138 .
It’s a fairly trivial pairing of sklearn and dask.array (and honestly something that sklearn
does just fine with a for loop). Still, it’s something that more established libraries show off, so we might as well have a clear user example on how to combine a large HDF5 file with an sklearn classifier using dask.array, probably something like the following
from sklearn import MyFavoriteEstimator
e = MyFavoriteEstimator()
import h5py
X = h5py.File(...)['/X']
y = h5py.File(...)['/X']
import dask.array as da
X = da.from_array(X, chunks=(...))
y = da.from_array(y, chunks=(...))
da.learn.fit(e, X, y)
e.predict(new_X)
The hard thing here is to find a large and meaningful dataset and the right estimator within sklearn
that supports partial fitting. Then this work should be publicized in some way, perhaps both in documentation and in some sort of blogpost / notebook.
Issue Analytics
- State:
- Created 8 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Array - Dask documentation
Dask Array implements a subset of the NumPy ndarray interface using blocked algorithms, cutting up the large array into many small arrays.
Read more >Experiment with Dask and TensorFlow - Matthew Rocklin
Prepare Data with Dask.array. For this toy example we're just going to use the mnist data that comes with TensorFlow. However, we'll ...
Read more >Pluralsight Tech Blog | Data Processing with Dask
In this post, we'll build a simple data pipeline for analytics and machine learning, working with text data in Dask.
Read more >Dask: Parallelize Everything - Medium
Dask provides the most widely-used data structures inherited from Pandas ... A quick example of visualizing this is to create a 2D array...
Read more >Machine Learning in Dask - Heartbeat
Processing a couple of gigabytes of data on one's laptop is usually an uphill task, unless the laptop has high RAM and a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Dictionary learning with a large stack of images would be an interesting application.
This is ill-formed and stale. Closing.
Also: http://matthewrocklin.com/blog/work/2016/04/20/dask-distributed-part-5