question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[REQUEST] Model serving via deepspeed's inference module

See original GitHub issue

Is your feature request related to a problem? Please describe. No

Describe the solution you’d like I am trying to run my model serving code in a model-parallel fashion. The tutorial shows how to run code on multi-GPU but the data is predefined, which cannot be used for serving. My original code is using fastapi to do the serving work. When using deepspeed --num_gpus n example.py the fastapi server will also be initiated n times, which cause port conflict.

Describe alternatives you’ve considered Do I have to first start the model in parallel using deepspeed in one script and then start another script for fastapi, and finally connect them somehow?

Additional context None.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

9reactions
david-rxcommented, Jan 5, 2022

Any update on this? Is there another recommended way to do this - for instance, if we wanted to run with uvicorn and thus couldn’t use the deepspeed launcher?

2reactions
callzhangcommented, Nov 2, 2021

Here is the minimum code I tried:

from fastapi import FastAPI, Request, Response, Query
from transformers import pipeline
import deepspeed, torch, os, uvicorn

app = FastAPI()

local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))
generator = pipeline('text-generation', model='gpt2', device=local_rank)
generator.model = deepspeed.init_inference(generator.model,
                                           mp_size=world_size,
                                           dtype=torch.float,
                                           replace_method='auto')


@app.get("/gen")
def generate(text):
    return generator(text, max_length=100)

if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
    print(f'initiating server on rank: {local_rank}')
    uvicorn.run(
        "min_example_deepspeed_mp:app", 
        host="0.0.0.0", port=8500, 
        log_level="info", 
        workers=1
    )

Then I ran deepspeed --num_gpus 2 min_example_deepspeed_mp.py and I got the following error:

[2021-11-03 01:33:39,359] [WARNING] [runner.py:122:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2021-11-03 01:33:39,373] [INFO] [runner.py:360:main] cmd = /home/stardust/anaconda3/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 nlp/sentence_generation/min_example_deepspeed_mp.py
[2021-11-03 01:33:39,993] [INFO] [launch.py:80:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2021-11-03 01:33:39,994] [INFO] [launch.py:86:main] nnodes=1, num_local_procs=2, node_rank=0
[2021-11-03 01:33:39,994] [INFO] [launch.py:101:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2021-11-03 01:33:39,994] [INFO] [launch.py:102:main] dist_world_size=2
[2021-11-03 01:33:39,994] [INFO] [launch.py:104:main] Setting CUDA_VISIBLE_DEVICES=0,1
initiating server on rank: 1
initiating server on rank: 0
initiating server on rank: 0
Traceback (most recent call last):
  File "nlp/sentence_generation/min_example_deepspeed_mp.py", line 17, in <module>
    uvicorn.run(
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/main.py", line 447, in run
    server.run()
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/server.py", line 68, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/home/stardust/anaconda3/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1501, in uvloop.loop.Loop.run_until_complete
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/server.py", line 76, in serve
    config.load()
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/config.py", line 448, in load
    self.loaded_app = import_from_string(self.app)
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/importer.py", line 21, in import_from_string
    module = importlib.import_module(module_str)
  File "/home/stardust/anaconda3/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/stardust/algorithms-playground/nlp/sentence_generation/min_example_deepspeed_mp.py", line 17, in <module>
    uvicorn.run(
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/main.py", line 447, in run
    server.run()
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/server.py", line 68, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/home/stardust/anaconda3/lib/python3.8/asyncio/runners.py", line 33, in run
    raise RuntimeError(
RuntimeError: asyncio.run() cannot be called from a running event loop
initiating server on rank: 1
Traceback (most recent call last):
  File "nlp/sentence_generation/min_example_deepspeed_mp.py", line 17, in <module>
    uvicorn.run(
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/main.py", line 447, in run
    server.run()
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/server.py", line 68, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/home/stardust/anaconda3/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1501, in uvloop.loop.Loop.run_until_complete
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/server.py", line 76, in serve
    config.load()
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/config.py", line 448, in load
    self.loaded_app = import_from_string(self.app)
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/importer.py", line 21, in import_from_string
    module = importlib.import_module(module_str)
  File "/home/stardust/anaconda3/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/stardust/algorithms-playground/nlp/sentence_generation/min_example_deepspeed_mp.py", line 17, in <module>
    uvicorn.run(
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/main.py", line 447, in run
    server.run()
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/server.py", line 68, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/home/stardust/anaconda3/lib/python3.8/asyncio/runners.py", line 33, in run
    raise RuntimeError(
RuntimeError: asyncio.run() cannot be called from a running event loop
sys:1: RuntimeWarning: coroutine 'Server.serve' was never awaited
sys:1: RuntimeWarning: coroutine 'Server.serve' was never awaited
Killing subprocess 8312
Killing subprocess 8313
Traceback (most recent call last):
  File "/home/stardust/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/stardust/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 171, in <module>
    main()
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 161, in main
    sigkill_handler(signal.SIGTERM, None)  # not coming back
  File "/home/stardust/anaconda3/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 139, in sigkill_handler
    raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/stardust/anaconda3/bin/python', '-u', 'nlp/sentence_generation/min_example_deepspeed_mp.py', '--local_rank=1']' returned non-zero exit status 1.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Getting Started with DeepSpeed for Inferencing ...
DeepSpeed -Inference introduces several features to efficiently serve transformer-based PyTorch models. It supports model parallelism (MP) to fit large ...
Read more >
Deploy large models on Amazon SageMaker using ...
DeepSpeed Inference supports large Transformer-based models with billions of parameters. It allows you to efficiently serve large models by ...
Read more >
DeepSpeed: Accelerating large-scale model inference and ...
Inference -adapted parallelism allows users to efficiently serve large models by adapting to the best parallelism strategies for multi-GPU ...
Read more >
ZeRO — DeepSpeed 0.8.0 documentation - Read the Docs
The Zero Redundancy Optimizer (ZeRO) removes the memory redundancies across data-parallel processes by partitioning the three model states (optimizer states ...
Read more >
DeepSpeed Integration
DeepSpeed ZeRO-3 can be used for inference as well, since it allows huge models to be loaded on multiple GPUs, which won't be...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found