question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ProgressBar ETA with IterableDataset where __len__ undefined

See original GitHub issue

❓ Questions/Help/Support

I’ve been successfully using ignite with regular Dataset/TensorDataset classes in the past. These have a fixed length and are tied to a DataLoader with a DistributedSampler. Keeping all other training hyper-parameters equal, if I increase the number of nodes/GPUs, I’ve always noticed that the ETA displayed by the ProgressBar reduces.

Then, I switched to an IterableDataset where the length was computable in advance and so __len__ was defined. There is no DistributedSampler defined in this case because the dataset is iterable: the data files are grouped into distinct subsets in advance and assigned to different ranks. In this scenario too, I noticed that keeping all else equal, the ETA displayed by ProgressBar reduces when the number of nodes/GPUs increases. Some earlier discussion on this here: https://github.com/pytorch/ignite/issues/1263.

Finally, I came across the setting where I had a massive dataset where the length (i.e., number of data-points) was not computable in advance. So I removed the __len__ definition, making the IterableDataset more generic.

Unfortunately, in this final setting, I find that the ETA displayed by ProgressBar doesn’t reduce when the number of nodes/GPUs increases. I tried training for a fixed 50000 iterations, i.e., epoch_length of 50000. I notice that if I train on 1 GPU, the ETA is much lesser than if I train on > 1 GPUs. I also notice that the overall time taken per iteration is much lesser when 1 GPU is used.

I’m confused about this behavior, it doesn’t seem like I’m doing something incorrect. Could you please explain what may be happening?

@vfdev-5

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9

github_iconTop GitHub Comments

1reaction
g-karthikcommented, Dec 24, 2020

@vfdev-5 oh, I see, I am using MetricsLambda with a callable method to perform a torch.distributed.all_reduce of some of the metrics like NLL and Accuracy, like this one with average_distributed_scalar().

Does this mean I need to necessarily stop doing that and switch to using idist.set_local_rank() with my local_rank, so that the sync_all_reduce decorator for the metrics get triggered? Am I missing something else that needs to be upgraded?

I think it’d be cool if you could do a PR for the above repo to allow for these changes in ignite. It is a useful example repo to highlight new ignite functionality.

1reaction
g-karthikcommented, Dec 24, 2020

@vfdev-5 gotcha, but why would idist even be invoked in my case, leading to that warning? I am not even importing it explicitly in my code, I have been using ignite for its other features - in fact, I did not even know about the idist feature until I saw this warning.

Once I get a better understanding of this, I’ll check the phrasing of the warning on the PR.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python Progress Bar - Stack Overflow
To use any progress bar framework in a useful manner, i.e. to get a percentage of completion and an estimated time of arrival...
Read more >
Welcome to Progress Bar's documentation! — Progress Bar ...
The ProgressBar class manages the current progress, and the format of the line is given by a number of widgets. A widget is...
Read more >
ProgressBar in indicatif - Rust - Docs.rs
Creates a new progress bar with a given length. This progress bar by default draws directly to stderr, and refreshes a maximum of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found