Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] DeepSpeed not compatible with latest transformers (4.22.1)

See original GitHub issue

Describe the bug I’m trying to get BLOOM to run on a Lambda Labs GPU cloud 8x40GB instance. It appears DeepSpeed isn’t compatible with the latest transformers -> ImportError: cannot import name 'cached_path' from 'transformers.utils'.

I ran the following command:

python bloom-inference-server/cli.py --model_name microsoft/bloom-deepspeed-inference-int8 --dtype int8 --deployment_framework ds_inference --generate_kwargs '{"min_length": 100, "max_new_tokens": 100, "do_sample": false}'

Here is the output:

[2022-09-21 16:10:37,900] [INFO] [deployment.py:74:deploy] *************DeepSpeed Optimizations: True*************
[2022-09-21 16:10:40,808] [INFO] [server_client.py:206:_initialize_service] multi-gpu deepspeed launch: ['deepspeed', '--num_gpus', '8', '--no_local_rank', '--no_python', '/usr/bin/python', '-m', 'mii.launch.multi_gpu_server', '--task-name', 'text-generation', '--model', 'bigscience/bloom', '--model-path', '/home/ubuntu/.cache/huggingface/hub/models--microsoft--bloom-deepspeed-inference-int8/snapshots/aa00a6626f6484a2eef68e06d1e089e4e32aa571', '--port', '50950', '--ds-optimize', '--provider', 'hugging-face-llm', '--config', 'eyJ0ZW5zb3JfcGFyYWxsZWwiOiA4LCAicG9ydF9udW1iZXIiOiA1MDk1MCwgImR0eXBlIjogImludDgiLCAiZW5hYmxlX2N1ZGFfZ3JhcGgiOiBmYWxzZSwgImNoZWNrcG9pbnRfZGljdCI6IHsiY2hlY2twb2ludHMiOiB7Im5vbl90cCI6IFsibm9uLXRwLnB0Il0sICJ0cCI6IFsidHBfMDBfMDAucHQiLCAidHBfMDFfMDAucHQiLCAidHBfMDJfMDAucHQiLCAidHBfMDNfMDAucHQiLCAidHBfMDBfMDEucHQiLCAidHBfMDFfMDEucHQiLCAidHBfMDJfMDEucHQiLCAidHBfMDNfMDEucHQiLCAidHBfMDBfMDIucHQiLCAidHBfMDFfMDIucHQiLCAidHBfMDJfMDIucHQiLCAidHBfMDNfMDIucHQiLCAidHBfMDBfMDMucHQiLCAidHBfMDFfMDMucHQiLCAidHBfMDJfMDMucHQiLCAidHBfMDNfMDMucHQiLCAidHBfMDBfMDQucHQiLCAidHBfMDFfMDQucHQiLCAidHBfMDJfMDQucHQiLCAidHBfMDNfMDQucHQiLCAidHBfMDBfMDUucHQiLCAidHBfMDFfMDUucHQiLCAidHBfMDJfMDUucHQiLCAidHBfMDNfMDUucHQiLCAidHBfMDBfMDYucHQiLCAidHBfMDFfMDYucHQiLCAidHBfMDJfMDYucHQiLCAidHBfMDNfMDYucHQiLCAidHBfMDBfMDcucHQiLCAidHBfMDFfMDcucHQiLCAidHBfMDJfMDcucHQiLCAidHBfMDNfMDcucHQiXX0sICJkdHlwZSI6ICJpbnQ4IiwgInBhcmFsbGVsaXphdGlvbiI6ICJ0cCIsICJ0cF9zaXplIjogNCwgInR5cGUiOiAiQkxPT00iLCAidmVyc2lvbiI6IDF9fQ==']
[2022-09-21 16:10:41,887] [WARNING] [runner.py:178:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2022-09-21 16:10:42,184] [INFO] [runner.py:504:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=29500 --no_python --no_local_rank /usr/bin/python -m mii.launch.multi_gpu_server --task-name text-generation --model bigscience/bloom --model-path /home/ubuntu/.cache/huggingface/hub/models--microsoft--bloom-deepspeed-inference-int8/snapshots/aa00a6626f6484a2eef68e06d1e089e4e32aa571 --port 50950 --ds-optimize --provider hugging-face-llm --config eyJ0ZW5zb3JfcGFyYWxsZWwiOiA4LCAicG9ydF9udW1iZXIiOiA1MDk1MCwgImR0eXBlIjogImludDgiLCAiZW5hYmxlX2N1ZGFfZ3JhcGgiOiBmYWxzZSwgImNoZWNrcG9pbnRfZGljdCI6IHsiY2hlY2twb2ludHMiOiB7Im5vbl90cCI6IFsibm9uLXRwLnB0Il0sICJ0cCI6IFsidHBfMDBfMDAucHQiLCAidHBfMDFfMDAucHQiLCAidHBfMDJfMDAucHQiLCAidHBfMDNfMDAucHQiLCAidHBfMDBfMDEucHQiLCAidHBfMDFfMDEucHQiLCAidHBfMDJfMDEucHQiLCAidHBfMDNfMDEucHQiLCAidHBfMDBfMDIucHQiLCAidHBfMDFfMDIucHQiLCAidHBfMDJfMDIucHQiLCAidHBfMDNfMDIucHQiLCAidHBfMDBfMDMucHQiLCAidHBfMDFfMDMucHQiLCAidHBfMDJfMDMucHQiLCAidHBfMDNfMDMucHQiLCAidHBfMDBfMDQucHQiLCAidHBfMDFfMDQucHQiLCAidHBfMDJfMDQucHQiLCAidHBfMDNfMDQucHQiLCAidHBfMDBfMDUucHQiLCAidHBfMDFfMDUucHQiLCAidHBfMDJfMDUucHQiLCAidHBfMDNfMDUucHQiLCAidHBfMDBfMDYucHQiLCAidHBfMDFfMDYucHQiLCAidHBfMDJfMDYucHQiLCAidHBfMDNfMDYucHQiLCAidHBfMDBfMDcucHQiLCAidHBfMDFfMDcucHQiLCAidHBfMDJfMDcucHQiLCAidHBfMDNfMDcucHQiXX0sICJkdHlwZSI6ICJpbnQ4IiwgInBhcmFsbGVsaXphdGlvbiI6ICJ0cCIsICJ0cF9zaXplIjogNCwgInR5cGUiOiAiQkxPT00iLCAidmVyc2lvbiI6IDF9fQ==
[2022-09-21 16:10:43,214] [INFO] [launch.py:136:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2022-09-21 16:10:43,214] [INFO] [launch.py:142:main] nnodes=1, num_local_procs=8, node_rank=0
[2022-09-21 16:10:43,214] [INFO] [launch.py:155:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2022-09-21 16:10:43,214] [INFO] [launch.py:156:main] dist_world_size=8
[2022-09-21 16:10:43,214] [INFO] [launch.py:158:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2022-09-21 16:10:45,832] [INFO] [server_client.py:115:_wait_until_server_is_live] waiting for server to start...
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-159-32-184
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-159-32-184
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-159-32-184
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-159-32-184
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-159-32-184
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-159-32-184
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-159-32-184
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-159-32-184
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-159-32-184
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-159-32-184
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-159-32-184
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-159-32-184
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-159-32-184
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-159-32-184
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-159-32-184
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-159-32-184
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 70, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 56, in main
    inference_pipeline = load_models(task_name=args.task_name,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/load_models.py", line 45, in load_models
    from mii.models.providers.llm import load_hf_llm
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/providers/llm.py", line 8, in <module>
    from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url
ImportError: cannot import name 'cached_path' from 'transformers.utils' (/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/__init__.py)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 70, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 56, in main
    inference_pipeline = load_models(task_name=args.task_name,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/load_models.py", line 45, in load_models
    from mii.models.providers.llm import load_hf_llm
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/providers/llm.py", line 8, in <module>
    from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url
ImportError: cannot import name 'cached_path' from 'transformers.utils' (/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/__init__.py)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 70, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 56, in main
    inference_pipeline = load_models(task_name=args.task_name,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/load_models.py", line 45, in load_models
    from mii.models.providers.llm import load_hf_llm
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/providers/llm.py", line 8, in <module>
    from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url
ImportError: cannot import name 'cached_path' from 'transformers.utils' (/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/__init__.py)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 70, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 56, in main
    inference_pipeline = load_models(task_name=args.task_name,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/load_models.py", line 45, in load_models
    from mii.models.providers.llm import load_hf_llm
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/providers/llm.py", line 8, in <module>
    from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url
ImportError: cannot import name 'cached_path' from 'transformers.utils' (/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/__init__.py)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 70, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 56, in main
    inference_pipeline = load_models(task_name=args.task_name,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/load_models.py", line 45, in load_models
    from mii.models.providers.llm import load_hf_llm
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/providers/llm.py", line 8, in <module>
    from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url
ImportError: cannot import name 'cached_path' from 'transformers.utils' (/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/__init__.py)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 70, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 56, in main
    inference_pipeline = load_models(task_name=args.task_name,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/load_models.py", line 45, in load_models
    from mii.models.providers.llm import load_hf_llm
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/providers/llm.py", line 8, in <module>
    from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url
ImportError: cannot import name 'cached_path' from 'transformers.utils' (/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/__init__.py)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 70, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 56, in main
    inference_pipeline = load_models(task_name=args.task_name,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/load_models.py", line 45, in load_models
    from mii.models.providers.llm import load_hf_llm
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/providers/llm.py", line 8, in <module>
    from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url
ImportError: cannot import name 'cached_path' from 'transformers.utils' (/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/__init__.py)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 70, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 56, in main
    inference_pipeline = load_models(task_name=args.task_name,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/load_models.py", line 45, in load_models
    from mii.models.providers.llm import load_hf_llm
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/providers/llm.py", line 8, in <module>
    from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url
ImportError: cannot import name 'cached_path' from 'transformers.utils' (/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/__init__.py)
[2022-09-21 16:10:50,262] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 83474
[2022-09-21 16:10:50,349] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 83475
[2022-09-21 16:10:50,422] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 83476
[2022-09-21 16:10:50,422] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 83477
[2022-09-21 16:10:50,493] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 83478
[2022-09-21 16:10:50,565] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 83479
[2022-09-21 16:10:50,582] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 83480
[2022-09-21 16:10:50,657] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 83481
[2022-09-21 16:10:50,769] [ERROR] [launch.py:292:sigkill_handler] ['/usr/bin/python', '-m', 'mii.launch.multi_gpu_server', '--task-name', 'text-generation', '--model', 'bigscience/bloom', '--model-path', '/home/ubuntu/.cache/huggingface/hub/models--microsoft--bloom-deepspeed-inference-int8/snapshots/aa00a6626f6484a2eef68e06d1e089e4e32aa571', '--port', '50950', '--ds-optimize', '--provider', 'hugging-face-llm', '--config', 'eyJ0ZW5zb3JfcGFyYWxsZWwiOiA4LCAicG9ydF9udW1iZXIiOiA1MDk1MCwgImR0eXBlIjogImludDgiLCAiZW5hYmxlX2N1ZGFfZ3JhcGgiOiBmYWxzZSwgImNoZWNrcG9pbnRfZGljdCI6IHsiY2hlY2twb2ludHMiOiB7Im5vbl90cCI6IFsibm9uLXRwLnB0Il0sICJ0cCI6IFsidHBfMDBfMDAucHQiLCAidHBfMDFfMDAucHQiLCAidHBfMDJfMDAucHQiLCAidHBfMDNfMDAucHQiLCAidHBfMDBfMDEucHQiLCAidHBfMDFfMDEucHQiLCAidHBfMDJfMDEucHQiLCAidHBfMDNfMDEucHQiLCAidHBfMDBfMDIucHQiLCAidHBfMDFfMDIucHQiLCAidHBfMDJfMDIucHQiLCAidHBfMDNfMDIucHQiLCAidHBfMDBfMDMucHQiLCAidHBfMDFfMDMucHQiLCAidHBfMDJfMDMucHQiLCAidHBfMDNfMDMucHQiLCAidHBfMDBfMDQucHQiLCAidHBfMDFfMDQucHQiLCAidHBfMDJfMDQucHQiLCAidHBfMDNfMDQucHQiLCAidHBfMDBfMDUucHQiLCAidHBfMDFfMDUucHQiLCAidHBfMDJfMDUucHQiLCAidHBfMDNfMDUucHQiLCAidHBfMDBfMDYucHQiLCAidHBfMDFfMDYucHQiLCAidHBfMDJfMDYucHQiLCAidHBfMDNfMDYucHQiLCAidHBfMDBfMDcucHQiLCAidHBfMDFfMDcucHQiLCAidHBfMDJfMDcucHQiLCAidHBfMDNfMDcucHQiXX0sICJkdHlwZSI6ICJpbnQ4IiwgInBhcmFsbGVsaXphdGlvbiI6ICJ0cCIsICJ0cF9zaXplIjogNCwgInR5cGUiOiAiQkxPT00iLCAidmVyc2lvbiI6IDF9fQ=='] exits with return code = 1
[2022-09-21 16:10:50,837] [INFO] [server_client.py:115:_wait_until_server_is_live] waiting for server to start...
Traceback (most recent call last):
  File "bloom-inference-server/cli.py", line 63, in <module>
    main()
  File "bloom-inference-server/cli.py", line 26, in main
    model = get_model_class(args.deployment_framework)(args)
  File "/home/ubuntu/transformers-bloom-inference/bloom-inference-server/models/ds_inference.py", line 92, in __init__
    mii.deploy(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/deployment.py", line 94, in deploy
    return _deploy_local(deployment_name, model_path=model_path)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/deployment.py", line 100, in _deploy_local
    mii.utils.import_score_file(deployment_name).init()
  File "/tmp/mii_cache/ds_inference_grpc_server/score.py", line 29, in init
    model = mii.MIIServerClient(task,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/server_client.py", line 90, in __init__
    self._wait_until_server_is_live()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/server_client.py", line 113, in _wait_until_server_is_live
    raise RuntimeError("server crashed for some reason, unable to proceed")
RuntimeError: server crashed for some reason, unable to proceed

To Reproduce Steps to reproduce the behavior:

pip install deepspeed>=0.7.3 transformers>=4.21.3 accelerate>=0.12.0 bitsandbytes protobuf==3.20.*
git clone https://github.com/huggingface/transformers-bloom-inference.git
cd transformers-bloom-inference
pip install .
cd ..
git clone https://github.com/microsoft/DeepSpeed-MII
cd DeepSpeed-MII
pip install .
cd ..
cd transformers-bloom-inference
python bloom-inference-server/cli.py --model_name microsoft/bloom-deepspeed-inference-int8 --dtype int8 --deployment_framework ds_inference --generate_kwargs '{"min_length": 100, "max_new_tokens": 100, "do_sample": false}'

Expected behavior Expect the bloom-inference-server to run without crashing

ds_report output

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/lib/python3/dist-packages/torch']
torch version .................... 1.11.0
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.6
deepspeed install path ........... ['/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.7.3, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.11, cuda 11.6

Screenshots None

System info (please complete the following information):

OS: Ubuntu 20.04.5 LTS
GPU count and types: 1 machine with 8x40GB A100s
Python version: Python 3.8.10
Any other relevant info about your setup: Lambda Labs GPU cloud

Launcher context I am running bloom-inference-server/cli.py from here which is using the deepspeed launcher

Docker context N/A

Additional context None

Issue Analytics

State:
Created a year ago
Comments:22 (10 by maintainers)

Top GitHub Comments

1reaction

tjarmaincommented, Oct 31, 2022

Thanks @mrwyattii I was able to get it working 😃

0reactions

mrwyattiicommented, Oct 31, 2022

Closing, but please re-open if you have any additional questions!

Top Results From Across the Web

Support latest Transformers and new cache design #69 - GitHub

Support latest Transformers and new cache design by mrwyattii · Pull Request #69 ... [BUG] DeepSpeed not compatible with latest transformers (4.22.1) ...

DeepSpeed Fix "Error building extension" - YouTube

Looking to use Adam with DeepSpeed, or another extension? This video shows you how to install DeepSpeed with optional extensions.

Troubleshoot - Hugging Face

Troubleshoot. Sometimes errors occur, but we are here to help! This guide covers some of the most common issues we've seen and how...

Newest 'huggingface-transformers' Questions - Page 3

I am begginer in NLP Transformers. I am facing this issue while deploying model using Django framework.Locally model is working fine but not...

optimum Changelog - pyup.io

jit.trace`. Not only tracing is now faster, it is also much more powerful (more about it [here](https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest ...