LocalCudaCluster freezes when trying neural network prediction
See original GitHub issueHi, I am new to dask and I was trying to run write a workflow to run inference on large images. I have attached the code Ive been using which should reproduce the issue I am facing.
Basically, if I use the distributed client scheduler with (Processes=False)
and also when not using a scheduler, I am able to run inference of my data.
However, when I try to use LocalCudaCluster
as the scheduler, I run into issues.
- In general, the process crashes and doesnt complete
- I have tried using with it 1 GPU/2 GPUs, using single threads and multiple threads per GPU.
- It does seem to work for a subset of the data (and not will my full data) (controlling dim0 in the
size
param in line 83), though much slower.
Quite possible, Im doing something incorrectly. The codes should help reproduce this.
Thanks for your help figuring this out.
Anas Test_prediction.zip
Issue Analytics
- State:
- Created 3 years ago
- Comments:18 (9 by maintainers)
Top Results From Across the Web
Accelerating Deep Learning Inference via Freezing - USENIX
We now try to predict the label for input X2 as follows: after the computation at each layer, we additionally compare the obtained...
Read more >Crash Prediction Using Deep Learning in a Disorienting ...
Our goal was to train and compare recurrent neural networks (RNN) and non-RNN deep learning models to predict the occurrence of crashes ......
Read more >How to Make Predictions with Keras - Machine Learning Mastery
In this tutorial, you will discover exactly how you can make classification and regression predictions with a finalized deep learning model with ...
Read more >Training Neural Networks: Best Practices | Machine Learning
This section explains backpropagation's failure cases and the most common way to regularize a neural network.
Read more >An Improved Deep Learning Model for Traffic Crash Prediction
To deal with the limitations of statistical methodologies, the machine learning methods, including Artificial Neural Network (ANN), ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Oh yes, to reduce the overall memory for testing you could reduce the
bsz
parameter to 8 This brings down memory consumption to ~18 GB or so.I will test with the latest and circle back
Closing. @anaszain89 if you are still running into issue feel free to reopen