[BUG] Can not install DeepSpeed with DS_BUILD_OPS=1 or JIT-compile ops at runtime
See original GitHub issueDescribe the bug Can not install DeepSpeed with DS_BUILD_OPS=1 or JIT-compile ops at runtime.
PyTorch version: Environment: torch==1.10.0+cu113 torchvision==0.11.1+cu113
CUDA SDK version: nvcc --version nvcc: NVIDIA ® Cuda compiler driver Copyright © 2005-2021 NVIDIA Corporation Built on Mon_May__3_19:15:13_PDT_2021 Cuda compilation tools, release 11.3, V11.3.109 Build cuda_11.3.r11.3/compiler.29920130_0
GCC version: gcc --version gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Copyright © 2019 Free Software Foundation, Inc.
To Reproduce Steps to reproduce the behavior:
- git clone https://github.com/microsoft/DeepSpeed cd DeepSpeed run sudo DS_BUILD_OPS=1 …/pip install -e . --global-option=“build_ext” --global-option=“-g” --global-option=“-j8” --no-cache -v --disable-pip-version-check
OR
- run DS_BUILD_OPS=1 ./install.sh -s -n
OR
- Install from pypi run DS_BUILD_OPS=1 pip install deepspeed --global-option=“build_ext” --global-option=“-g” --global-option=“-j8” --no-cache -v --disable-pip-version-check
All these actions result in roughly the same error:
[2021-11-08 18:19:01,584] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl
[2021-11-08 18:19:01,589] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5, git-hash=unknown, git-branch=unknown
Using amp fp16 backend
[2021-11-08 18:19:01,598] [INFO] [logging.py:68:log_dist] [Rank 0] initializing deepspeed groups
[2021-11-08 18:19:01,598] [INFO] [logging.py:68:log_dist] [Rank 0] initializing deepspeed model parallel group with size 1
[2021-11-08 18:19:01,598] [INFO] [logging.py:68:log_dist] [Rank 0] initializing deepspeed expert parallel group with size 1
[2021-11-08 18:19:01,598] [INFO] [logging.py:68:log_dist] [Rank 0] creating expert data parallel process group with ranks: [0]
[2021-11-08 18:19:01,598] [INFO] [logging.py:68:log_dist] [Rank 0] creating expert parallel process group with ranks: [0]
[2021-11-08 18:19:01,671] [INFO] [engine.py:207:init] DeepSpeed Flops Profiler Enabled: False
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks…
To disable this warning, you can either:
- Avoid using tokenizers
before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks…
To disable this warning, you can either:
- Avoid using tokenizers
before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks…
To disable this warning, you can either:
- Avoid using tokenizers
before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks…
To disable this warning, you can either:
- Avoid using tokenizers
before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks…
To disable this warning, you can either:
- Avoid using tokenizers
before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks…
To disable this warning, you can either:
- Avoid using tokenizers
before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks…
To disable this warning, you can either:
- Avoid using tokenizers
before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Using /home/deepschneider/.cache/torch_extensions/py38_cu113 as PyTorch extensions root…
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks…
To disable this warning, you can either:
- Avoid using tokenizers
before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks…
To disable this warning, you can either:
- Avoid using tokenizers
before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks…
To disable this warning, you can either:
- Avoid using tokenizers
before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Detected CUDA files, patching ldflags
Emitting ninja build file /home/deepschneider/.cache/torch_extensions/py38_cu113/cpu_adam/build.ninja…
Building extension module cpu_adam…
Allowing ninja to set a default number of workers… (overridable by setting the environment variable MAX_JOBS=N)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks…
To disable this warning, you can either:
- Avoid using tokenizers
before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[1/2] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/include -isystem /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/include/TH -isystem /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -L/usr/local/cuda/lib64 -lcudart -lcublas -g -Wno-reorder -march=native -fopenmp -D__SCALAR -c /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
FAILED: cpu_adam.o
c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/include -isystem /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/include/TH -isystem /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -L/usr/local/cuda/lib64 -lcudart -lcublas -g -Wno-reorder -march=native -fopenmp -D__SCALAR -c /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
In file included from /home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:1:
/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/cpu_adam.h: In constructor ‘Adam_Optimizer::Adam_Optimizer(float, float, float, float, float, bool)’:
/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/cpu_adam.h:39:49: error: ‘TILE’ was not declared in this scope; did you mean ‘FILE’?
39 | cudaMallocHost((void**)_doubled_buffer, TILE * sizeof(float));
| ^~~~
| FILE
/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp: In member function ‘void Adam_Optimizer::Step_1(float*, float*, float*, float*, size_t, __half*, bool)’:
/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:51:61: error: ‘TILE’ was not declared in this scope; did you mean ‘FILE’?
51 | for (size_t t = rounded_size; t < _param_size; t += TILE) {
| ^~~~
| FILE
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/utils/cpp_extension.py”, line 1717, in _run_ninja_build
subprocess.run(
File “/usr/lib/python3.8/subprocess.py”, line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command ‘[‘ninja’, ‘-v’]’ returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/gpt_neo_xl_deepspeed.py”, line 46, in <module> Trainer(model=model, args=training_args, train_dataset=train_dataset, File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/transformers/trainer.py”, line 1157, in train deepspeed_engine, optimizer, lr_scheduler = deepspeed_init( File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/transformers/deepspeed.py”, line 362, in deepspeed_init model, optimizer, _, lr_scheduler = deepspeed.initialize( File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/init.py”, line 131, in initialize engine = DeepSpeedEngine(args=args, File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/runtime/engine.py”, line 223, in init self._configure_optimizer(optimizer, model_parameters) File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/runtime/engine.py”, line 882, in _configure_optimizer basic_optimizer = self._configure_basic_optimizer(model_parameters) File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/runtime/engine.py”, line 962, in _configure_basic_optimizer optimizer = DeepSpeedCPUAdam(model_parameters, File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py”, line 83, in init self.ds_opt_adam = CPUAdamBuilder().load() File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py”, line 362, in load return self.jit_load(verbose) File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py”, line 394, in jit_load op_module = load( File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/utils/cpp_extension.py”, line 1124, in load return _jit_compile( File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/utils/cpp_extension.py”, line 1337, in _jit_compile _write_ninja_file_and_build_library( File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/utils/cpp_extension.py”, line 1449, in _write_ninja_file_and_build_library _run_ninja_build( File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch/utils/cpp_extension.py”, line 1733, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension ‘cpu_adam’ Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f2351a98280> Traceback (most recent call last): File “/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py”, line 97, in del AttributeError: ‘DeepSpeedCPUAdam’ object has no attribute ‘ds_opt_adam’
Expected behavior Successfully compiled ops.
ds_report output
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja ninja … [OKAY]
op name … installed … compatible
cpu_adam … [NO] … [OKAY] cpu_adagrad … [NO] … [OKAY] fused_adam … [NO] … [OKAY] fused_lamb … [NO] … [OKAY] sparse_attn … [NO] … [OKAY] transformer … [NO] … [OKAY] stochastic_transformer . [NO] … [OKAY] async_io … [NO] … [OKAY] transformer_inference … [NO] … [OKAY] utils … [NO] … [OKAY] quantizer … [NO] … [OKAY]
DeepSpeed general environment info: torch install path … [‘/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch’] torch version … 1.10.0+cu113 torch cuda version … 11.3 nvcc version … 11.3 deepspeed install path … [‘/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed’] deepspeed info … 0.5.5, unknown, unknown deepspeed wheel compiled w. … torch 1.10, cuda 11.3
Screenshots If applicable, add screenshots to help explain your problem.
System info (please complete the following information):
- OS: Ubuntu 20.04
- GPU: 1xRTX A6000
- Interconnects (if applicable) [e.g., two machines connected with 100 Gbps IB]: no
- Python version: 3.8.10
- Any other relevant info about your setup: no
Launcher context
Are you launching your experiment with the deepspeed
launcher, MPI, or something else?
Huggingface Trainer
Docker context Are you using a specific docker image that you can share?
Additional context Add any other context about the problem here.
Issue Analytics
- State:
- Created 2 years ago
- Comments:22 (4 by maintainers)
Top GitHub Comments
I used this: "sudo DS_BUILD_OPS=1 pip install . " at DeepSpeed directory.
Hi @dredwardhyde
Thanks for reporting this. Please try this PR to see if it is resolved. Thanks, Reza