Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Can not install DeepSpeed with DS_BUILD_OPS=1 or JIT-compile ops at runtime

See original GitHub issue

Describe the bug Can not install DeepSpeed with DS_BUILD_OPS=1 or JIT-compile ops at runtime.

PyTorch version: Environment: torch==1.10.0+cu113 torchvision==0.11.1+cu113

CUDA SDK version: nvcc --version nvcc: NVIDIA ® Cuda compiler driver Copyright © 2005-2021 NVIDIA Corporation Built on Mon_May__3_19:15:13_PDT_2021 Cuda compilation tools, release 11.3, V11.3.109 Build cuda_11.3.r11.3/compiler.29920130_0

To Reproduce Steps to reproduce the behavior:

git clone https://github.com/microsoft/DeepSpeed cd DeepSpeed run sudo DS_BUILD_OPS=1 …/pip install -e . --global-option=“build_ext” --global-option=“-g” --global-option=“-j8” --no-cache -v --disable-pip-version-check

run DS_BUILD_OPS=1 ./install.sh -s -n

Install from pypi run DS_BUILD_OPS=1 pip install deepspeed --global-option=“build_ext” --global-option=“-g” --global-option=“-j8” --no-cache -v --disable-pip-version-check

All these actions result in roughly the same error:

[2021-11-08 18:19:01,584] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl [2021-11-08 18:19:01,589] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5, git-hash=unknown, git-branch=unknown Using amp fp16 backend [2021-11-08 18:19:01,598] [INFO] [logging.py:68:log_dist] [Rank 0] initializing deepspeed groups [2021-11-08 18:19:01,598] [INFO] [logging.py:68:log_dist] [Rank 0] initializing deepspeed model parallel group with size 1 [2021-11-08 18:19:01,598] [INFO] [logging.py:68:log_dist] [Rank 0] initializing deepspeed expert parallel group with size 1 [2021-11-08 18:19:01,598] [INFO] [logging.py:68:log_dist] [Rank 0] creating expert data parallel process group with ranks: [0] [2021-11-08 18:19:01,598] [INFO] [logging.py:68:log_dist] [Rank 0] creating expert parallel process group with ranks: [0] [2021-11-08 18:19:01,671] [INFO] [engine.py:207:init] DeepSpeed Flops Profiler Enabled: False huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks… To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks… To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks… To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks… To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks… To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks… To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks… To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Using /home/deepschneider/.cache/torch_extensions/py38_cu113 as PyTorch extensions root… huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks… To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks… To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks… To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Detected CUDA files, patching ldflags Emitting ninja build file /home/deepschneider/.cache/torch_extensions/py38_cu113/cpu_adam/build.ninja… Building extension module cpu_adam… Allowing ninja to set a default number of workers… (overridable by setting the environment variable MAX_JOBS=N) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks… To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) [1/2] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/include -isystem /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/include/TH -isystem /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -L/usr/local/cuda/lib64 -lcudart -lcublas -g -Wno-reorder -march=native -fopenmp -D__SCALAR -c /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o FAILED: cpu_adam.o c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/include -isystem /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/include/TH -isystem /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -L/usr/local/cuda/lib64 -lcudart -lcublas -g -Wno-reorder -march=native -fopenmp -D__SCALAR -c /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o In file included from /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:1: /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/cpu_adam.h: In constructor ‘Adam_Optimizer::Adam_Optimizer(float, float, float, float, float, bool)’: /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/cpu_adam.h:39:49: error: ‘TILE’ was not declared in this scope; did you mean ‘FILE’? 39 | cudaMallocHost((void**)_doubled_buffer, TILE * sizeof(float)); | ^~~~ | FILE /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp: In member function ‘void Adam_Optimizer::Step_1(float*, float*, float*, float*, size_t, __half*, bool)’: /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:51:61: error: ‘TILE’ was not declared in this scope; did you mean ‘FILE’? 51 | for (size_t t = rounded_size; t < _param_size; t += TILE) { | ^~~~ | FILE ninja: build stopped: subcommand failed. Traceback (most recent call last): File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/utils/cpp_extension.py”, line 1717, in _run_ninja_build subprocess.run( File “/usr/lib/python3.8/subprocess.py”, line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command ‘[‘ninja’, ‘-v’]’ returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/gpt_neo_xl_deepspeed.py”, line 46, in <module> Trainer(model=model, args=training_args, train_dataset=train_dataset, File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/transformers/trainer.py”, line 1157, in train deepspeed_engine, optimizer, lr_scheduler = deepspeed_init( File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/transformers/deepspeed.py”, line 362, in deepspeed_init model, optimizer, _, lr_scheduler = deepspeed.initialize( File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/init.py”, line 131, in initialize engine = DeepSpeedEngine(args=args, File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/runtime/engine.py”, line 223, in init self._configure_optimizer(optimizer, model_parameters) File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/runtime/engine.py”, line 882, in _configure_optimizer basic_optimizer = self._configure_basic_optimizer(model_parameters) File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/runtime/engine.py”, line 962, in _configure_basic_optimizer optimizer = DeepSpeedCPUAdam(model_parameters, File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py”, line 83, in init self.ds_opt_adam = CPUAdamBuilder().load() File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py”, line 362, in load return self.jit_load(verbose) File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py”, line 394, in jit_load op_module = load( File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/utils/cpp_extension.py”, line 1124, in load return _jit_compile( File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/utils/cpp_extension.py”, line 1337, in _jit_compile _write_ninja_file_and_build_library( File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/utils/cpp_extension.py”, line 1449, in _write_ninja_file_and_build_library _run_ninja_build( File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/utils/cpp_extension.py”, line 1733, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension ‘cpu_adam’ Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f2351a98280> Traceback (most recent call last): File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py”, line 97, in del AttributeError: ‘DeepSpeedCPUAdam’ object has no attribute ‘ds_opt_adam’

Expected behavior Successfully compiled ops.

ds_report output

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja … [OKAY]

op name … installed … compatible

cpu_adam … [NO] … [OKAY] cpu_adagrad … [NO] … [OKAY] fused_adam … [NO] … [OKAY] fused_lamb … [NO] … [OKAY] sparse_attn … [NO] … [OKAY] transformer … [NO] … [OKAY] stochastic_transformer . [NO] … [OKAY] async_io … [NO] … [OKAY] transformer_inference … [NO] … [OKAY] utils … [NO] … [OKAY] quantizer … [NO] … [OKAY]

DeepSpeed general environment info: torch install path … [‘/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch’] torch version … 1.10.0+cu113 torch cuda version … 11.3 nvcc version … 11.3 deepspeed install path … [‘/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed’] deepspeed info … 0.5.5, unknown, unknown deepspeed wheel compiled w. … torch 1.10, cuda 11.3

Screenshots If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

OS: Ubuntu 20.04
GPU: 1xRTX A6000
Interconnects (if applicable) [e.g., two machines connected with 100 Gbps IB]: no
Python version: 3.8.10
Any other relevant info about your setup: no

Launcher context Are you launching your experiment with the deepspeed launcher, MPI, or something else? Huggingface Trainer

Docker context Are you using a specific docker image that you can share?

Additional context Add any other context about the problem here.