Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why do we need to use `Loss.repeat(eval_batch_size)` in accelerator gather loop?

See original GitHub issue

https://github.com/huggingface/transformers/blob/b1198a8440cc05f569b0bc22038993a1e5e707ab/examples/pytorch/language-modeling/run_mlm_no_trainer.py#L510

If I do not use this, and simple do acclerator.gather(loss) my code is stuck at this point. But if I repeat the loss it seems to work. Can you explain why is this the case ?

Why do we also later use losses = losses[: len(eval_dataset)] ?

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

LysandreJikcommented, Aug 25, 2021

Hi @thakursc1, Sylvain is currently off until next week - he’ll answer your query when he’s back from his break. Thanks for your understanding.

0reactions

github-actions[bot]commented, Sep 24, 2021

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Top Results From Across the Web

About the use of gather to compute metrics · Issue #226 - GitHub

I used the code below to train a naive model on MNIST data using 3 GPUs (on a ... At each step, accelerator.gather()...

Introducing Accelerate - Hugging Face

Accelerate was created for PyTorch users who like to have full control over their training loops but are reluctant to write (and maintain) ......

transformers.get_scheduler Example - Program Talk

We will let the accelerator handle device placement for us in this example. args ... loss = outputs.loss losses.append(accelerator.gather(loss.repeat(args.

ML Frameworks: Hugging Face Accelerate w/ Sylvain Gugger

... the training loop of PyTorch models but are reluctant to write and maintain the boilerplate code needed to use multi-GPUs/TPU/fp16.

https://openi.pcl.ac.cn/OpenModelZoo/DGU/commit/a1...

+ +`numRows` is the only option which could be set by user, other values must be ... -lost -full -opened -must -included -live...