question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why do we need to use `Loss.repeat(eval_batch_size)` in accelerator gather loop?

See original GitHub issue

https://github.com/huggingface/transformers/blob/b1198a8440cc05f569b0bc22038993a1e5e707ab/examples/pytorch/language-modeling/run_mlm_no_trainer.py#L510

If I do not use this, and simple do acclerator.gather(loss) my code is stuck at this point. But if I repeat the loss it seems to work. Can you explain why is this the case ?

Why do we also later use losses = losses[: len(eval_dataset)] ?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
LysandreJikcommented, Aug 25, 2021

Hi @thakursc1, Sylvain is currently off until next week - he’ll answer your query when he’s back from his break. Thanks for your understanding.

0reactions
github-actions[bot]commented, Sep 24, 2021

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Read more comments on GitHub >

github_iconTop Results From Across the Web

About the use of gather to compute metrics · Issue #226 - GitHub
I used the code below to train a naive model on MNIST data using 3 GPUs (on a ... At each step, accelerator.gather()...
Read more >
Introducing Accelerate - Hugging Face
Accelerate was created for PyTorch users who like to have full control over their training loops but are reluctant to write (and maintain) ......
Read more >
transformers.get_scheduler Example - Program Talk
We will let the accelerator handle device placement for us in this example. args ... loss = outputs.loss losses.append(accelerator.gather(loss.repeat(args.
Read more >
ML Frameworks: Hugging Face Accelerate w/ Sylvain Gugger
... the training loop of PyTorch models but are reluctant to write and maintain the boilerplate code needed to use multi-GPUs/TPU/fp16.
Read more >
https://openi.pcl.ac.cn/OpenModelZoo/DGU/commit/a1...
+ +`numRows` is the only option which could be set by user, other values must be ... -lost -full -opened -must -included -live...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found