Following CIFAR Tutorial but Code Forcing RANK variable
See original GitHub issueI am trying to get DeepSpeed working and have been following the CIFAR tutorial example. In the example local_rank=-1
and dist_init_required=None
as it is only with a single system (not distributed). However, it seems that it is forcing me to have RANK, LOCAL_RANK and other distributed environmental variables set. Should dist_init_required=False
?
Issue Analytics
- State:
- Created 4 years ago
- Comments:13 (8 by maintainers)
Top Results From Across the Web
How to Develop a CNN From Scratch for CIFAR-10 Photo ...
In this tutorial, you will discover how to develop a convolutional neural network model from scratch for object photo classification. After ...
Read more >tutorials/cifar10_tutorial.py at main · pytorch/tutorials
For this tutorial, we will use the CIFAR10 dataset. It has the classes: 'airplane', 'automobile', 'bird', 'cat', 'deer',.
Read more >CIFAR-10 convolutional network
This is a small CIFAR-10 convolutional neural network designed to run on one Loihi chip. ... The main libraries we'll be using in...
Read more >Multi node PyTorch Distributed Training Guide For People ...
The goal of this tutorial is to give a summary of how to write and launch PyTorch distributed data parallel jobs across multiple...
Read more >CIFAR-10 on Benchmarks.AI
We demonstrate the advantages of GPipe by training large-scale neural networks on two different tasks with distinct network architectures: (i) Image ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thank you for this. I believe this was the issue. I have been using
nn.DataParallel
and should upgrade to the distributed method.Well, this proved to be unrelated to this issue - one needs to forward
--local_rank
todeepspeed
initialize’sargs
- in the application I am trying to integrate deepspeed in it was goobled up by another consumer of argparser.