Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error in building Transformer kernel

See original GitHub issue

I am using deepspeed/deepspeed:latest container (I tried to install Deepspeed with DS_BUILD_OPS=1 pip install deepspeed but I got the same error) and trying to use the Transformer kernel provided by DeepSpeed as follows:

from deepspeed import DeepSpeedTransformerLayer, DeepSpeedTransformerConfig

if __name__ == "__main__":
    transformer_config = DeepSpeedTransformerConfig(
        batch_size=40,
        hidden_size=768,
        heads=768 // 64,
        intermediate_size=768 * 4,
        attn_dropout_ratio=0.0,
        hidden_dropout_ratio=0.0,
        num_hidden_layers=4,
        initializer_range=0.02,
        fp16=True,
        pre_layer_norm=True,
        stochastic_mode=True,
    )
    layer = DeepSpeedTransformerLayer(config=transformer_config)

But I can’t initialize the layer with the following error

DeepSpeed Transformer config is  {'layer_id': 0, 'batch_size': 40, 'hidden_size': 768, 'intermediate_size': 3072, 'heads': 12, 'attn_dropout_ratio': 0.0, 'hidden_dropout_ratio': 0.0, 'num_hidden_layers': 4, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': -1, 'seed': -1, 'normalize_invertible': False, 'gelu_checkpoint': False, 'adjust_init_range': True, 'test_gemm': False, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': False, 'stochastic_mode': True, 'huggingface': False}
Using /root/.cache/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/stochastic_transformer/build.ninja...
Building extension module stochastic_transformer...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/8] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/cublas_wrappers.cu -o cublas_wrappers.cuda.o
[2/8] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu -o dropout_kernels.cuda.o
FAILED: dropout_kernels.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu -o dropout_kernels.cuda.o
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu(102): error: no operator "*" matches these operands
            operand types are: __half2 * const __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu(103): error: no operator "*" matches these operands
            operand types are: __half2 * const __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu(216): error: no operator "*" matches these operands
            operand types are: __half2 * const __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu(217): error: no operator "*" matches these operands
            operand types are: __half2 * const __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu(335): error: no operator "*" matches these operands
            operand types are: __half2 * const __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu(336): error: no operator "*" matches these operands
            operand types are: __half2 * const __half2

6 errors detected in the compilation of "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu".
[3/8] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu -o normalize_kernels.cuda.o
FAILED: normalize_kernels.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu -o normalize_kernels.cuda.o
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(880): error: no operator "*=" matches these operands
            operand types are: __half2 *= __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(883): error: no operator "-" matches these operands
            operand types are: const __half2 - const __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(885): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "const __half2", and vice versa

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(890): error: no operator "*=" matches these operands
            operand types are: __half2 *= __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(892): error: no operator "-" matches these operands
            operand types are: const __half2 - const __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(893): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "const __half2", and vice versa

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(901): error: no operator "*" matches these operands
            operand types are: __half2 * __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(901): error: identifier "h2sqrt" is undefined

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(905): error: identifier "h2rsqrt" is undefined

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(927): error: no operator "-" matches these operands
            operand types are: - __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1189): error: no operator "*=" matches these operands
            operand types are: __half2 *= __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1194): error: no operator "*=" matches these operands
            operand types are: __half2 *= __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1205): error: no operator "-" matches these operands
            operand types are: const __half2 - __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1206): error: no operator "*" matches these operands
            operand types are: __half2 * __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1210): error: identifier "h2rsqrt" is undefined

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1232): error: no operator "-" matches these operands
            operand types are: - __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1232): error: identifier "h2rsqrt" is undefined

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1621): error: no operator "*=" matches these operands
            operand types are: __half2 *= __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1624): error: no operator "-" matches these operands
            operand types are: const __half2 - const __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1626): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "const __half2", and vice versa

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1631): error: no operator "*=" matches these operands
            operand types are: __half2 *= __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1633): error: no operator "-" matches these operands
            operand types are: const __half2 - const __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1634): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "const __half2", and vice versa

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1642): error: no operator "*" matches these operands
            operand types are: __half2 * __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1642): error: identifier "h2sqrt" is undefined

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1646): error: identifier "h2rsqrt" is undefined

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1668): error: no operator "-" matches these operands
            operand types are: - __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1703): error: no operator "+" matches these operands
            operand types are: __half2 + const __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1710): error: no operator "+" matches these operands
            operand types are: __half2 + const __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1940): error: no operator "*=" matches these operands
            operand types are: __half2 *= __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1946): error: no operator "*=" matches these operands
            operand types are: __half2 *= __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1959): error: no operator "-" matches these operands
            operand types are: __half2 - __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1960): error: no operator "*" matches these operands
            operand types are: __half2 * __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1964): error: identifier "h2rsqrt" is undefined

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1986): error: no operator "-" matches these operands
            operand types are: - __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1986): error: identifier "h2rsqrt" is undefined

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(2021): error: no operator "+" matches these operands
            operand types are: __half2 + const __half2

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(2027): error: no operator "+" matches these operands
            operand types are: __half2 + const __half2

38 errors detected in the compilation of "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu".
[4/8] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/general_kernels.cu -o general_kernels.cuda.o
[5/8] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/transform_kernels.cu -o transform_kernels.cuda.o
[6/8] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/gelu_kernels.cu -o gelu_kernels.cuda.o
[7/8] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/softmax_kernels.cu -o softmax_kernels.cuda.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1549, in _run_ninja_build
    subprocess.run(
  File "/opt/conda/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "experimentation.py", line 17, in <module>
    layer = DeepSpeedTransformerLayer(config=transformer_config)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/transformer/transformer.py", line 543, in __init__
    stochastic_transformer_cuda_module = StochasticTransformerBuilder().load()
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 180, in load
    return self.jit_load(verbose)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 208, in jit_load
    op_module = load(
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 999, in load
    return _jit_compile(
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1204, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1308, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1565, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'stochastic_transformer'