Why do we need to use `Loss.repeat(eval_batch_size)` in accelerator gather loop?
See original GitHub issueIf I do not use this, and simple do acclerator.gather(loss) my code is stuck at this point. But if I repeat the loss it seems to work. Can you explain why is this the case ?
Why do we also later use losses = losses[: len(eval_dataset)]
?
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
About the use of gather to compute metrics · Issue #226 - GitHub
I used the code below to train a naive model on MNIST data using 3 GPUs (on a ... At each step, accelerator.gather()...
Read more >Introducing Accelerate - Hugging Face
Accelerate was created for PyTorch users who like to have full control over their training loops but are reluctant to write (and maintain) ......
Read more >transformers.get_scheduler Example - Program Talk
We will let the accelerator handle device placement for us in this example. args ... loss = outputs.loss losses.append(accelerator.gather(loss.repeat(args.
Read more >ML Frameworks: Hugging Face Accelerate w/ Sylvain Gugger
... the training loop of PyTorch models but are reluctant to write and maintain the boilerplate code needed to use multi-GPUs/TPU/fp16.
Read more >https://openi.pcl.ac.cn/OpenModelZoo/DGU/commit/a1...
+ +`numRows` is the only option which could be set by user, other values must be ... -lost -full -opened -must -included -live...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi @thakursc1, Sylvain is currently off until next week - he’ll answer your query when he’s back from his break. Thanks for your understanding.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.