question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error with the ninja building "sparse_attn"

See original GitHub issue

In python 3.6.7 pytorch 1.5.0 torchvision 0.6.0 cuda 10.2 gcc 5.4.0 ubuntu 16.04.5 LTS

Using /tmp/torch_extensions as PyTorch extensions root...
Emitting ninja build file /tmp/torch_extensions/sparse_attn/build.ninja...
Building extension module sparse_attn...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF utils.o.d -DTORCH_EXTENSION_NAME=sparse_attn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O2 -fopenmp -c /usr/local/lib/python3.6/dist-packages/deepspeed/ops/csrc/sparse_attention/utils.cpp -o utils.o 
FAILED: utils.o 
c++ -MMD -MF utils.o.d -DTORCH_EXTENSION_NAME=sparse_attn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O2 -fopenmp -c /usr/local/lib/python3.6/dist-packages/deepspeed/ops/csrc/sparse_attention/utils.cpp -o utils.o 
/usr/local/lib/python3.6/dist-packages/deepspeed/ops/csrc/sparse_attention/utils.cpp: In function 'void segment_blocks(at::Tensor, at::Tensor, at::Tensor, int, ret_t&)':
/usr/local/lib/python3.6/dist-packages/deepspeed/ops/csrc/sparse_attention/utils.cpp:87:71: error: converting to 'std::vector<std::tuple<int, at::Tensor> >::value_type {aka std::tuple<int, at::Tensor>}' from initializer list would use explicit constructor 'constexpr std::tuple<_T1, _T2>::tuple(_U1&&, _U2&&) [with _U1 = int&; _U2 = at::Tensor; <template-parameter-2-3> = void; _T1 = int; _T2 = at::Tensor]'
     if (!to_cat.empty()) ret.push_back({max_width, torch::cat(to_cat)});
                                                                       ^
/usr/local/lib/python3.6/dist-packages/deepspeed/ops/csrc/sparse_attention/utils.cpp: In function 'ret_t sdd_segment(at::Tensor, int)':
/usr/local/lib/python3.6/dist-packages/deepspeed/ops/csrc/sparse_attention/utils.cpp:110:90: warning: narrowing conversion of 'H' from 'size_t {aka long unsigned int}' to 'long int' inside { } [-Wnarrowing]
     torch::Tensor scratch = torch::empty({H, layout.sum().item<int>(), 4}, layout.dtype());
                                                                                          ^
/usr/local/lib/python3.6/dist-packages/deepspeed/ops/csrc/sparse_attention/utils.cpp:110:90: warning: narrowing conversion of 'H' from 'size_t {aka long unsigned int}' to 'long int' inside { } [-Wnarrowing]
ninja: build stopped: subcommand failed.

and

ERROR [01/26 16:37:15 fastreid.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1400, in _run_ninja_build
    check=True)
  File "/usr/lib/python3.6/subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./fastreid/engine/train_loop.py", line 121, in train
    self.run_step()
  File "./fastreid/engine/train_loop.py", line 200, in run_step
    outputs = self.model(data)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "./fastreid/modeling/meta_arch/baseline.py", line 58, in forward
    features = self.backbone(images)  # (bs, 2048, 16, 8)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "./fastreid/modeling/backbones/sparse_transformer.py", line 415, in forward
    x = self.forward_features(x)
  File "./fastreid/modeling/backbones/sparse_transformer.py", line 408, in forward_features
    x = blk(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "./fastreid/modeling/backbones/sparse_transformer.py", line 257, in forward
    x = x + self.drop_path(self.attn(self.norm1(x)))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "./fastreid/modeling/backbones/sparse_transformer.py", line 235, in forward
    x = self.sparse_self_attn(q, k, v).transpose(1, 2).reshape(B, N, C)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/deepspeed/ops/sparse_attention/sparse_self_attention.py", line 152, in forward
    attn_output_weights = sparse_dot_sdd_nt(query, key)
  File "/usr/local/lib/python3.6/dist-packages/deepspeed/ops/sparse_attention/matmul.py", line 712, in __call__
    db_lut, db_num_locks, db_width, db_packs = self.make_lut(a.dtype, a.device)
  File "/usr/local/lib/python3.6/dist-packages/deepspeed/ops/sparse_attention/matmul.py", line 634, in make_lut
    c_lut, c_num_locks, c_width, c_packs = _sparse_matmul.make_sdd_lut(layout, block, dtype, device)
  File "/usr/local/lib/python3.6/dist-packages/deepspeed/ops/sparse_attention/matmul.py", line 99, in make_sdd_lut
    _sparse_matmul._load_utils()
  File "/usr/local/lib/python3.6/dist-packages/deepspeed/ops/sparse_attention/matmul.py", line 94, in _load_utils
    _sparse_matmul.cpp_utils = SparseAttnBuilder().load()
  File "/usr/local/lib/python3.6/dist-packages/deepspeed/ops/op_builder/builder.py", line 180, in load
    return self.jit_load(verbose)
  File "/usr/local/lib/python3.6/dist-packages/deepspeed/ops/op_builder/builder.py", line 216, in jit_load
    verbose=verbose)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 898, in load
    is_python_module)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1086, in _jit_compile
    with_cuda=with_cuda)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1186, in _write_ninja_file_and_build_library
    error_prefix="Error building extension '{}'".format(name))
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1413, in _run_ninja_build
    raise RuntimeError(message)
RuntimeError: Error building extension 'sparse_attn'

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
zxy2020commented, Jul 29, 2021

is gcc version too lower?

Thanks, It worked atfer I updated gcc and other components.

Thanks,I update the gcc g++ from 5.4 to 7.5 ,then I success! from

  • sudo apt-get install gcc-7 g++-7
  • cd /usr/bin
  • sudo rm gcc
  • sudo ln -s gcc-7 gcc
  • sudo rm g++
  • sudo ln -s g++-7 g++
0reactions
vick-wuweicommented, Mar 16, 2021

Can you try pre-compiling the op instead of using ninja? you can do this by re-installing with DS_BUILD_SPARSE_ATTN=1 pip install . from within the source directory (or replace . with deepspeed to get from pypi).

Thanks, It worked atfer I updated gcc and other components.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Ninja failed to build due to "The system cannot find the file ...
The command line I used to build Chromium is "ninja -C out/Debug chrome" (in src folder). Is there any suggestions to fix this...
Read more >
Error occurs in Ninja while building CMake - Stack Overflow
Build command failed. Error while executing process D:\install\sdk\cmake\3.10. 2.4988404\bin\ninja.exe with arguments {-C D:\Android projects\ ...
Read more >
Ninja build fails because of gcc - Reddit
And I have been constantly getting this error: ... Failed to preprocess host compiler properties. ninja: build stopped: subcommand failed.
Read more >
sparse-attn/support-latest-triton · mirrors / microsoft / DeepSpeed
... be built just-in-time (JIT) using torch's JIT C++ extension loader that relies on ninja to build and dynamically link them at runtime....
Read more >
Ninja Install Fail - ReScript Forum
Here is the gigantic error. ... Error: Command failed: ... No prebuilt Ninja, building Ninja now npm ERR! bootstrapping ninja... npm ERR!
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found