question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Unable to make inference with gpt-neo-1.3B model

See original GitHub issue

Describe the bug Unable to make inference with gpt-neo-1.3B model

To Reproduce Code snippet used:

# filename: test_gpt_neo_ds.py
import torch
import sys
import deepspeed
import os

from transformers import GPTNeoForCausalLM, GPT2Tokenizer

world_size = int(os.getenv('WORLD_SIZE', '2'))
local_rank = os.getenv('LOCAL_RANK')

model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B")
# model.to(int(local_rank))
#'''
model.base_model = deepspeed.init_inference(
    model.base_model,
    mp_size=world_size,
    replace_with_kernel_inject=True,
    replace_method='auto'
)
#'''

tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")

prompt = (
    "In a shocking finding, scientists discovered a herd of unicorns living in a remote, "
    "previously unexplored valley, in the Andes Mountains. Even more surprising to the "
    "researchers was the fact that the unicorns spoke perfect English."
)

input_ids = tokenizer(prompt, return_tensors="pt").input_ids
input_ids = input_ids.to(int(local_rank))

gen_tokens = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.9,
    max_length=100
)

gen_text = tokenizer.batch_decode(gen_tokens)[0]
print(f"Generated Text - {gen_text}")

Expected behavior Stacktrace of error:

$ deepspeed --num_gpus 2 test_gpt_neo_ds.py
[2022-03-02 19:03:59,744] [WARNING] [runner.py:148:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2022-03-02 19:04:00,549] [INFO] [runner.py:420:main] cmd = /home/ubuntu/anaconda3/envs/pytorch_p38/bin/python3.8 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 test_gpt_neo_ds.py
[2022-03-02 19:04:01,661] [INFO] [launch.py:96:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2022-03-02 19:04:01,661] [INFO] [launch.py:102:main] nnodes=1, num_local_procs=2, node_rank=0
[2022-03-02 19:04:01,661] [INFO] [launch.py:115:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2022-03-02 19:04:01,661] [INFO] [launch.py:116:main] dist_world_size=2
[2022-03-02 19:04:01,661] [INFO] [launch.py:118:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2022-03-02 19:04:22,947] [INFO] [logging.py:69:log_dist] [Rank -1] DeepSpeed info: version=0.5.10, git-hash=unknown, git-branch=unknown
[2022-03-02 19:04:22,947] [INFO] [engine.py:127:_init_quantization_setting] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2022-03-02 19:04:22,947] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl
[2022-03-02 19:04:23,095] [INFO] [logging.py:69:log_dist] [Rank -1] DeepSpeed info: version=0.5.10, git-hash=unknown, git-branch=unknown
[2022-03-02 19:04:23,096] [INFO] [engine.py:127:_init_quantization_setting] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2022-03-02 19:04:23,096] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl
Using /home/ubuntu/.cache/torch_extensions/py38_cu111 as PyTorch extensions root...
Using /home/ubuntu/.cache/torch_extensions/py38_cu111 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py38_cu111/transformer_inference/build.ninja...
Building extension module transformer_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.7159159183502197 seconds
DeepSpeed Transformer Inference config is  {'layer_id': 0, 'hidden_size': 2048, 'intermediate_size': 8192, 'heads': 16, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256, 'rotary_dim': -1, 'return_tuple': True, 'mlp_after_attn': True}
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.7038288116455078 seconds
DeepSpeed Transformer Inference config is  {'layer_id': 0, 'hidden_size': 2048, 'intermediate_size': 8192, 'heads': 16, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'scale_attention': True, 'specialized_mode': False, 'triangular_masking': True, 'local_attention': False, 'window_size': 256, 'rotary_dim': -1, 'return_tuple': True, 'mlp_after_attn': True}
Traceback (most recent call last):
  File "test_gpt_neo_ds.py", line 16, in <module>
    model.base_model = deepspeed.init_inference(
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/__init__.py", line 274, in init_inference
    engine = InferenceEngine(model,
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 86, in __init__
    self._apply_injection_policy(
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 161, in _apply_injection_policy
    replace_transformer_layer(client_module,
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 464, in replace_transformer_layer
    return replace_module(model=model,
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 561, in replace_module
    replaced_module, _ = _replace_module(model, policy)
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 583, in _replace_module
    _, layer_id = _replace_module(child, policies, layer_id=layer_id)
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 578, in _replace_module
    policies[child.__class__][0](child,
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 453, in replace_fn
    new_module = replace_with_policy(child,
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 290, in replace_with_policy
    attn_block.attn_ob = mp_replace.copy(attn_block.attn_ob.data, dense_b)
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1198, in __setattr__
    raise TypeError("cannot assign '{}' as parameter '{}' "
TypeError: cannot assign 'torch.FloatTensor' as parameter 'attn_ob' (torch.nn.Parameter or None expected)
Traceback (most recent call last):
  File "test_gpt_neo_ds.py", line 16, in <module>
    model.base_model = deepspeed.init_inference(
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/__init__.py", line 274, in init_inference
    engine = InferenceEngine(model,
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 86, in __init__
    self._apply_injection_policy(
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 161, in _apply_injection_policy
    replace_transformer_layer(client_module,
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 464, in replace_transformer_layer
    return replace_module(model=model,
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 561, in replace_module
    replaced_module, _ = _replace_module(model, policy)
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 583, in _replace_module
    _, layer_id = _replace_module(child, policies, layer_id=layer_id)
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 578, in _replace_module
    policies[child.__class__][0](child,
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 453, in replace_fn
    new_module = replace_with_policy(child,
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 290, in replace_with_policy
    attn_block.attn_ob = mp_replace.copy(attn_block.attn_ob.data, dense_b)
  File "/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1198, in __setattr__
    raise TypeError("cannot assign '{}' as parameter '{}' "
TypeError: cannot assign 'torch.FloatTensor' as parameter 'attn_ob' (torch.nn.Parameter or None expected)
[2022-03-02 19:04:31,724] [INFO] [launch.py:160:sigkill_handler] Killing subprocess 69838
[2022-03-02 19:04:31,724] [INFO] [launch.py:160:sigkill_handler] Killing subprocess 69839
[2022-03-02 19:04:31,724] [ERROR] [launch.py:166:sigkill_handler] ['/home/ubuntu/anaconda3/envs/pytorch_p38/bin/python3.8', '-u', 'test_gpt_neo_ds.py', '--local_rank=1'] exits with return code = 1

If I comment out model.base_model = deepspeed.init_inference(.., everything works fine. Here is the output:

$ deepspeed --num_gpus 2 test_gpt_neo_ds.py
[2022-03-02 19:00:55,038] [WARNING] [runner.py:148:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2022-03-02 19:00:55,855] [INFO] [runner.py:420:main] cmd = /home/ubuntu/anaconda3/envs/pytorch_p38/bin/python3.8 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 test_gpt_neo_ds.py
[2022-03-02 19:00:57,056] [INFO] [launch.py:96:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2022-03-02 19:00:57,056] [INFO] [launch.py:102:main] nnodes=1, num_local_procs=2, node_rank=0
[2022-03-02 19:00:57,056] [INFO] [launch.py:115:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2022-03-02 19:00:57,056] [INFO] [launch.py:116:main] dist_world_size=2
[2022-03-02 19:00:57,056] [INFO] [launch.py:118:main] Setting CUDA_VISIBLE_DEVICES=0,1
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Generated Text - In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

“They were very friendly,” said lead author José Luis Cárdenas, an associate professor at the Universidad de Chile. “They were really nice to us.”

Many scientists have wondered about the existence of the unic
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
[2022-03-02 19:01:32,131] [INFO] [launch.py:189:main] Process 68880 exits successfully.
Generated Text - In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

The study was funded by the Smithsonian Institution’s National Museum of Natural History and the National Geographic Society.

“I am very concerned that a unicorn could cause havoc around the world,” said Jeff Kallisch, executive director of the Smithsonian
[2022-03-02 19:01:35,135] [INFO] [launch.py:189:main] Process 68879 exits successfully.

ds_report output

$ ds_report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch']
torch version .................... 1.10.2+cu111
torch cuda version ............... 11.1
nvcc version ..................... 11.1
deepspeed install path ........... ['/home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.5.10, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.10, cuda 11.1

Screenshots If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

  • OS: Ubuntu 18.04.01
  • GPU count and types: machine with x8 A100
  • Python version: 3.8
  • Any other relevant info about your setup

Launcher context Are you launching your experiment with the deepspeed launcher, MPI, or something else?

Docker context Are you using a specific docker image that you can share?

Additional context Add any other context about the problem here.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
vishalkgcommented, Mar 3, 2022

Hi @RezaYazdaniAminabadi

I went ahead and updated the file as mentioned in the PR and still getting that issue. I was using deepspeed version 0.5.10 which is the latest tag.

Then, I tried installing deepspeed from master branch: DeepSpeed info: version=0.6.0+c3c8d5d, git-hash=c3c8d5d, git-branch=master. After this the error didn’t appear. Seems like the mentioned issue is fixed in master and will be included in next release.

As far as PR(#1805) is concerned, I can confirm, with it, the results are consistent across GPU.

Summary: The above mentioned issue should be fixed in next release.

I will close the issue.

0reactions
vishalkgcommented, Mar 3, 2022

Thanks for replying, going to test the PR and let you know.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Inference | Classroom Strategies
Helping students understand when information is implied, or not directly stated, will improve their skill in drawing conclusions and making inferences.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found