Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

test_sparse_attention.py failed with CUDA 11.0

See original GitHub issue

Hi, I’m running an issue in building DeepSpeed with Sparse Attention using V100 + CUDA 11.0 environment. Specifically, after removing this skip flag (https://github.com/microsoft/DeepSpeed/blob/2660cc4dd43348306c58913775ceb4878379abe5/tests/unit/test_sparse_attention.py#L243), I ran into the following error with pytest tests/unit/test_sparse_attention.py -sv:

======================================================================================================= test session starts =======================================================================================================platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /private/home/songweig/anaconda3/envs/deepspeed/bin/python
cachedir: .pytest_cache
rootdir: /private/home/songweig/projects/DeepSpeed
collected 54 items

tests/unit/test_sparse_attention.py::test_sparse_attention_module_availability PASSED
tests/unit/test_sparse_attention.py::test_matmul_module_availability PASSED
tests/unit/test_sparse_attention.py::test_softmax_module_availability PASSED
tests/unit/test_sparse_attention.py::test_sparsityconfig_module_availability PASSED
tests/unit/test_sparse_attention.py::test_densesparsityconfig_module_availability PASSED
tests/unit/test_sparse_attention.py::test_fixedsparsityconfig_module_availability PASSED
tests/unit/test_sparse_attention.py::test_variablesparsityconfig_module_availability PASSED
tests/unit/test_sparse_attention.py::test_bigbirdsparsityconfig_module_availability PASSED
tests/unit/test_sparse_attention.py::test_bslongformersparsityconfig_module_availability PASSED
tests/unit/test_sparse_attention.py::test_sparseselfattention_module_availability PASSED
tests/unit/test_sparse_attention.py::test_bertsparseselfattention_module_availability PASSED
tests/unit/test_sparse_attention.py::test_sparseattentionutils_availability PASSED
tests/unit/test_sparse_attention.py::test_cpp_utils_availability Sparse Attention cpp_utils Module is not installed!
PASSED
tests/unit/test_sparse_attention.py::test_softmax[dtype0-256-16] PASSED
tests/unit/test_sparse_attention.py::test_softmax[dtype0-256-32] PASSED
tests/unit/test_sparse_attention.py::test_softmax[dtype0-576-16] PASSED
tests/unit/test_sparse_attention.py::test_softmax[dtype0-576-32] PASSED
tests/unit/test_sparse_attention.py::test_softmax[dtype1-256-16] PASSED
tests/unit/test_sparse_attention.py::test_softmax[dtype1-256-32] PASSED
tests/unit/test_sparse_attention.py::test_softmax[dtype1-576-16] PASSED
tests/unit/test_sparse_attention.py::test_softmax[dtype1-576-32] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype0-sdd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype1-sdd-False-True] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype2-sdd-True-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype3-sdd-True-True] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype4-dsd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype5-dsd-False-True] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype6-dsd-True-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype7-dsd-True-True] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype8-dds-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype9-dds-False-True] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype10-dds-True-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype11-dds-True-True] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype12-sdd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype13-sdd-False-True] FAILED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype14-sdd-True-False] FAILED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype15-sdd-True-True] FAILED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype16-dsd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype17-dsd-False-True] FAILED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype18-dsd-True-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype19-dsd-True-True] FAILED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype20-dds-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype21-dds-False-True] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype22-dds-True-False] FAILED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype23-dds-True-True] FAILED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype24-sdd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype25-dsd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype26-dds-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[32-dtype27-sdd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[32-dtype28-dsd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[32-dtype29-dds-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[64-dtype30-sdd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[64-dtype31-dsd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[64-dtype32-dds-False-False] PASSED

============================================================================================================ FAILURES =============================================================================================================_____________________________________________________________________________________________ test_matmul[16-dtype13-sdd-False-True] ______________________________________________________________________________________________
block = 16, dtype = torch.float32, mode = 'sdd', trans_a = False, trans_b = True

    @pytest.mark.parametrize("block, dtype, mode, trans_a, trans_b", testdata)
    def test_matmul(block, dtype, mode, trans_a, trans_b):
        _skip_on_cuda_compatability()
        Z = 3
        H = 2
        M = 128
        N = 256
        K = 192
        rho = 0.5
        x, w, dy, shape, layout = init_matmul_inputs(Z, H, M, N, K, rho, mode, trans_a, trans_b, block, dtype, layout=None)
        ref_y, ref_dx, ref_dw = run_matmul_reference(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
        st_y, st_dx, st_dw = run_matmul_sparse(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
        assert allclose(ref_y, st_y)
        assert allclose(ref_dx, st_dx)
>       assert allclose(ref_dw, st_dw)
E       AssertionError: assert False
E        +  where False = allclose(tensor([[[[10.6649,  8.8779, 13.0814,  ...,  9.5998, 10.8293, 11.3977],\n          [10.1814,  9.0207, 11.7046,  ..., 10...,  7.6252,  6.5125],\n          [ 6.6710,  6.8485,  8.2308,  ...,  6.6604,  8.7260,  7.5133]]]],\n       device='cuda:0'), tensor([[[[10.6649, 10.1814,  9.6523,  ..., 12.4634, 10.5354, 12.3890],\n          [ 8.8779,  9.0207,  7.7923,  ..., 10...,  7.6252,  8.7260],\n          [ 8.8778,  9.3911,  
7.2760,  ...,  9.9537,  6.5125,  7.5133]]]],\n       device='cuda:0'))

tests/unit/test_sparse_attention.py:352: AssertionError
_____________________________________________________________________________________________ test_matmul[16-dtype14-sdd-True-False] ______________________________________________________________________________________________
block = 16, dtype = torch.float32, mode = 'sdd', trans_a = True, trans_b = False

    @pytest.mark.parametrize("block, dtype, mode, trans_a, trans_b", testdata)
    def test_matmul(block, dtype, mode, trans_a, trans_b):
        _skip_on_cuda_compatability()
        Z = 3
        H = 2
        M = 128
        N = 256
        K = 192
        rho = 0.5
        x, w, dy, shape, layout = init_matmul_inputs(Z, H, M, N, K, rho, mode, trans_a, trans_b, block, dtype, layout=None)
        ref_y, ref_dx, ref_dw = run_matmul_reference(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
        st_y, st_dx, st_dw = run_matmul_sparse(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
        assert allclose(ref_y, st_y)
>       assert allclose(ref_dx, st_dx)
E       AssertionError: assert False
E        +  where False = allclose(tensor([[[[19.4000, 21.2085, 17.2626,  ..., 29.4252, 29.0357, 30.3427],\n          [19.4239, 21.6188, 17.5557,  ..., 28..., 30.8725, 31.2582],\n          [35.9650, 37.1245, 35.6984,  ..., 27.4908, 28.2339, 28.2403]]]],\n       device='cuda:0'), tensor([[[[19.4000, 19.4239, 18.3452,  ..., 29.5145, 28.4773, 29.5899],\n          [21.2085, 21.6188, 20.8695,  ..., 28..., 30.8725, 28.2339],\n          [35.4469, 35.0486, 40.4683,  ..., 26.8261, 31.2582, 28.2403]]]],\n       device='cuda:0'))

tests/unit/test_sparse_attention.py:351: AssertionError
______________________________________________________________________________________________ test_matmul[16-dtype15-sdd-True-True] ______________________________________________________________________________________________
block = 16, dtype = torch.float32, mode = 'sdd', trans_a = True, trans_b = True

    @pytest.mark.parametrize("block, dtype, mode, trans_a, trans_b", testdata)
    def test_matmul(block, dtype, mode, trans_a, trans_b):
        _skip_on_cuda_compatability()
        Z = 3
        H = 2
        M = 128
        N = 256
        K = 192
        rho = 0.5
        x, w, dy, shape, layout = init_matmul_inputs(Z, H, M, N, K, rho, mode, trans_a, trans_b, block, dtype, layout=None)
        ref_y, ref_dx, ref_dw = run_matmul_reference(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
        st_y, st_dx, st_dw = run_matmul_sparse(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
        assert allclose(ref_y, st_y)
>       assert allclose(ref_dx, st_dx)
E       AssertionError: assert False
E        +  where False = allclose(tensor([[[[16.6002, 18.2478, 13.4669,  ..., 25.9451, 25.6677, 26.0942],\n          [20.7412, 23.0264, 18.4304,  ..., 27..., 30.6817, 34.0804],\n          [33.9480, 34.5858, 36.3433,  ..., 29.5946, 27.6837, 31.4218]]]],\n       device='cuda:0'), tensor([[[[16.6002, 20.7412, 18.8973,  ..., 28.4696, 29.7352, 26.7173],\n          [18.2478, 23.0264, 21.7679,  ..., 27..., 30.6817, 27.6837],\n          [33.7383, 37.8429, 36.3986,  ..., 31.8884, 34.0804, 31.4218]]]],\n       device='cuda:0'))

tests/unit/test_sparse_attention.py:351: AssertionError
_____________________________________________________________________________________________ test_matmul[16-dtype17-dsd-False-True] ______________________________________________________________________________________________
block = 16, dtype = torch.float32, mode = 'dsd', trans_a = False, trans_b = True

    @pytest.mark.parametrize("block, dtype, mode, trans_a, trans_b", testdata)
    def test_matmul(block, dtype, mode, trans_a, trans_b):
        _skip_on_cuda_compatability()
        Z = 3
        H = 2
        M = 128
        N = 256
        K = 192
        rho = 0.5
        x, w, dy, shape, layout = init_matmul_inputs(Z, H, M, N, K, rho, mode, trans_a, trans_b, block, dtype, layout=None)
        ref_y, ref_dx, ref_dw = run_matmul_reference(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
        st_y, st_dx, st_dw = run_matmul_sparse(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
        assert allclose(ref_y, st_y)
        assert allclose(ref_dx, st_dx)
>       assert allclose(ref_dw, st_dw)
E       AssertionError: assert False
E        +  where False = allclose(tensor([[[[18.8306, 15.4051, 20.5179,  ...,  9.4702, 12.4456, 10.4211],\n          [17.6555, 15.5962, 18.1549,  ..., 11..., 18.2374, 16.3979],\n          [18.7140, 17.7162, 16.2511,  ..., 14.3712, 15.7780, 14.2453]]]],\n       device='cuda:0'), tensor([[[[18.8306, 17.6555, 15.9134,  ..., 14.4461, 11.5269, 11.8117],\n          [15.4051, 15.5962, 13.7993,  ..., 11..., 18.2374, 15.7780],\n          [26.4535, 19.9620, 21.2492,  ..., 17.2536, 16.3979, 14.2453]]]],\n       device='cuda:0'))

tests/unit/test_sparse_attention.py:352: AssertionError
______________________________________________________________________________________________ test_matmul[16-dtype19-dsd-True-True] ______________________________________________________________________________________________
block = 16, dtype = torch.float32, mode = 'dsd', trans_a = True, trans_b = True

    @pytest.mark.parametrize("block, dtype, mode, trans_a, trans_b", testdata)
    def test_matmul(block, dtype, mode, trans_a, trans_b):
        _skip_on_cuda_compatability()
        Z = 3
        H = 2
        M = 128
        N = 256
        K = 192
        rho = 0.5
        x, w, dy, shape, layout = init_matmul_inputs(Z, H, M, N, K, rho, mode, trans_a, trans_b, block, dtype, layout=None)
        ref_y, ref_dx, ref_dw = run_matmul_reference(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
        st_y, st_dx, st_dw = run_matmul_sparse(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
        assert allclose(ref_y, st_y)
        assert allclose(ref_dx, st_dx)
>       assert allclose(ref_dw, st_dw)
E       AssertionError: assert False
E        +  where False = allclose(tensor([[[[ 6.1987,  6.6541,  6.1035,  ..., 17.3724, 18.4083, 22.4095],\n          [ 6.6077,  6.5493,  6.6188,  ..., 18...,  8.3150,  8.6550],\n          [17.1273, 18.5145, 16.2477,  ...,  7.4840,  7.8811,  7.5407]]]],\n       device='cuda:0'), tensor([[[[ 6.1987,  6.6077,  5.4255,  ..., 20.8259, 19.6450, 19.2312],\n          [ 6.6541,  6.5493,  6.0281,  ..., 18...,  8.3150,  7.8811],\n          [25.5614, 22.5099, 23.1032,  ...,  7.8368,  8.6550,  7.5407]]]],\n       device='cuda:0'))

tests/unit/test_sparse_attention.py:352: AssertionError
_____________________________________________________________________________________________ test_matmul[16-dtype22-dds-True-False] ______________________________________________________________________________________________
block = 16, dtype = torch.float32, mode = 'dds', trans_a = True, trans_b = False

    @pytest.mark.parametrize("block, dtype, mode, trans_a, trans_b", testdata)
    def test_matmul(block, dtype, mode, trans_a, trans_b):
        _skip_on_cuda_compatability()
        Z = 3
        H = 2
        M = 128
        N = 256
        K = 192
        rho = 0.5
        x, w, dy, shape, layout = init_matmul_inputs(Z, H, M, N, K, rho, mode, trans_a, trans_b, block, dtype, layout=None)
        ref_y, ref_dx, ref_dw = run_matmul_reference(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
        st_y, st_dx, st_dw = run_matmul_sparse(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
        assert allclose(ref_y, st_y)
>       assert allclose(ref_dx, st_dx)
E       AssertionError: assert False
E        +  where False = allclose(tensor([[[[19.4000, 21.2085, 17.2626,  ..., 22.0433, 19.4535, 21.8816],\n          [19.4239, 21.6188, 17.5557,  ..., 23..., 17.6377, 18.5949],\n          [14.9210, 16.5670, 16.2262,  ..., 16.1409, 16.6919, 17.3924]]]],\n       device='cuda:0'), tensor([[[[19.4000, 19.4239, 18.3452,  ..., 21.3079, 20.1844, 22.4448],\n          [21.2085, 21.6188, 20.8695,  ..., 23..., 17.6377, 16.6919],\n          [16.9149, 17.7246, 17.9125,  ..., 17.7202, 18.5949, 17.3924]]]],\n       device='cuda:0'))

tests/unit/test_sparse_attention.py:351: AssertionError
______________________________________________________________________________________________ test_matmul[16-dtype23-dds-True-True] ______________________________________________________________________________________________
block = 16, dtype = torch.float32, mode = 'dds', trans_a = True, trans_b = True

    @pytest.mark.parametrize("block, dtype, mode, trans_a, trans_b", testdata)
    def test_matmul(block, dtype, mode, trans_a, trans_b):
        _skip_on_cuda_compatability()
        Z = 3
        H = 2
        M = 128
        N = 256
        K = 192
        rho = 0.5
        x, w, dy, shape, layout = init_matmul_inputs(Z, H, M, N, K, rho, mode, trans_a, trans_b, block, dtype, layout=None)
        ref_y, ref_dx, ref_dw = run_matmul_reference(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
        st_y, st_dx, st_dw = run_matmul_sparse(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
        assert allclose(ref_y, st_y)
>       assert allclose(ref_dx, st_dx)
E       AssertionError: assert False
E        +  where False = allclose(tensor([[[[38.8488, 40.6006, 39.4168,  ..., 37.1583, 37.8707, 39.2315],\n          [38.4693, 42.6132, 39.3447,  ..., 39..., 51.2968, 50.1901],\n          [47.7016, 47.3861, 48.2378,  ..., 45.5591, 48.6820, 49.2167]]]],\n       device='cuda:0'), tensor([[[[38.8488, 38.4693, 41.7933,  ..., 38.4504, 42.3083, 36.1750],\n          [40.6006, 42.6132, 44.5984,  ..., 39..., 51.2968, 48.6820],\n          [42.7509, 45.4472, 45.4716,  ..., 48.8592, 50.1901, 49.2167]]]],\n       device='cuda:0'))

tests/unit/test_sparse_attention.py:351: AssertionError
======================================================================================================== warnings summary =========================================================================================================tests/unit/test_sparse_attention.py::test_softmax[dtype0-256-16]
  /private/home/songweig/projects/DeepSpeed/tests/unit/test_sparse_attention.py:138: UserWarning: This overload of nonzero is deprecated:
        nonzero()
  Consider using one of the following signatures instead:
        nonzero(*, bool as_tuple) (Triggered internally at  /opt/conda/conda-bld/pytorch_1607370172916/work/torch/csrc/utils/python_arg_parser.cpp:882.)
    nnz = mask.nonzero()

-- Docs: https://docs.pytest.org/en/stable/warnings.html
===================================================================================================== short test summary info =====================================================================================================FAILED tests/unit/test_sparse_attention.py::test_matmul[16-dtype13-sdd-False-True] - AssertionError: assert False
FAILED tests/unit/test_sparse_attention.py::test_matmul[16-dtype14-sdd-True-False] - AssertionError: assert False
FAILED tests/unit/test_sparse_attention.py::test_matmul[16-dtype15-sdd-True-True] - AssertionError: assert False
FAILED tests/unit/test_sparse_attention.py::test_matmul[16-dtype17-dsd-False-True] - AssertionError: assert False
FAILED tests/unit/test_sparse_attention.py::test_matmul[16-dtype19-dsd-True-True] - AssertionError: assert False
FAILED tests/unit/test_sparse_attention.py::test_matmul[16-dtype22-dds-True-False] - AssertionError: assert False
FAILED tests/unit/test_sparse_attention.py::test_matmul[16-dtype23-dds-True-True] - AssertionError: assert False
============================================================================================ 7 failed, 47 passed, 1 warning in 50.33s =============================================================================================

In addition, here is my ds_report output for your reference:

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [YES] ...... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/private/home/songweig/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch']
torch version .................... 1.7.1
torch cuda version ............... 11.0
nvcc version ..................... 11.0
deepspeed install path ........... ['/private/home/songweig/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.4.2, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.7, cuda 11.0

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

hyunwoongkocommented, Sep 7, 2021

https://github.com/microsoft/DeepSpeed/issues/1348 refer this!

0reactions

SongweiGecommented, Jul 22, 2021

Hi Reza, the code I used was not test but from the mentioned repo above.

I was able to get things to work with an older version of deepspeed and triton: deepspeed==0.3.1 and triton==0.2.3. I’m wondering if you have any thoughts on what have been changed in with the latest version to cause the error?

Thanks, Songwei