question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] generate() with do_sample isn't done on multi-GPUs Stage3 at T5ForConditionalGeneration

See original GitHub issue

Describe the bug In multi-GPUs, finished generate() only 1 GPU.

But, when I used 1 GPU, it works well.

This bug happens with T5ForConditionalGeneration but doesn’t happens with GPT2LMHeadModel

+-------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-SXM...  On   | 00000000:00:0B.0 Off |                    0 |
| N/A   40C    P0    92W / 400W |  24222MiB / 40536MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   8  NVIDIA A100-SXM...  On   | 00000000:80:00.0 Off |                    0 |
| N/A   38C    P0    87W / 400W |  20572MiB / 40536MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

dist.barrier() is persisted.

To Reproduce

    def generation_t5_from(self, encoded_example, max_tokens):
        #--------------------------- Tokenizing text of prompts -------------------------------#

        self.tokenizer_policy.padding_side = 'left'
        prompts = self.tokenizer( 
            encoded_example['inputs'],
            padding='longest',
            truncation=True,
            max_length=max_tokens,
            return_tensors="pt"
        ).to(self.device['lm'])
        self.tokenizer_policy.padding_side = 'right'

        #---------------------------- Generation from the prompts ------------------------------#

        generated_token_length = self.num_steps
        generations = self.lm.generate(
            prompts.input_ids,
            max_length=generated_token_length,
            do_sample=True,
        )
        dist.barrier()

Expected behavior Every GPUs finished generate() and dist.barrier().

ds_report output

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/kyungmin.lee/anaconda3/envs/lib/python3.8/site-packages/torch']
torch version .................... 1.11.0+cu113
torch cuda version ............... 11.3
torch hip version ................ None
nvcc version ..................... 10.1
deepspeed install path ........... ['/home/kyungmin.lee/DeepSpeed/deepspeed']
deepspeed info ................... 0.6.6+ae198e20, ae198e20, master
deepspeed wheel compiled w. ...... torch 1.11, cuda 11.3

Screenshots If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

  • OS: Ubuntu 20.04
  • GPU count and types: one machine with x2 A100s each
  • Python version 3.8.12
  • Transformers 4.19.4

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
stas00commented, Jul 26, 2022

you must use generate(..., synced_gpus=True) when using ZeRO stage-3

0reactions
lkm2835commented, Jul 26, 2022

Solution:

    gen = engine.generate(
                prompts.input_ids,
                max_length=128,
                do_sample=True,
                synced_gpus=True
                )
Read more comments on GitHub >

github_iconTop Results From Across the Web

[BUG][0.6.7] garbage output for multi-gpu with tutorial #2113
When running GPU = 2 started to see garbage output generated. ... I am running with 2 GPU instance with V100, also reproducible...
Read more >
Efficient Training on Multiple GPUs - Hugging Face
The processing is done in parallel and all setups are synchronized at the end of each training step. TensorParallel (TP) - each tensor...
Read more >
Problems with multi-gpus - MATLAB Answers - MathWorks
I have no problem training with a single gpu, but when I try to train with multiple gpus, matlab generates the following error:...
Read more >
Fast Multi-GPU collectives with NCCL | NVIDIA Technical Blog
The first is that enough parallelism has not been exposed to efficiently saturate the processors. The second reason for poor scaling is that ......
Read more >
Multi-GPUs and Custom Training Loops in TensorFlow 2
Checkpoint within the tf.strategy.MirroredStrategy scope. The following is unrelated to the distributed training tutorial but to make life ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found