test_sparse_attention.py failed with CUDA 11.0
See original GitHub issueHi, I’m running an issue in building DeepSpeed with Sparse Attention using V100 + CUDA 11.0 environment. Specifically, after removing this skip flag (https://github.com/microsoft/DeepSpeed/blob/2660cc4dd43348306c58913775ceb4878379abe5/tests/unit/test_sparse_attention.py#L243), I ran into the following error with pytest tests/unit/test_sparse_attention.py -sv
:
======================================================================================================= test session starts =======================================================================================================platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /private/home/songweig/anaconda3/envs/deepspeed/bin/python
cachedir: .pytest_cache
rootdir: /private/home/songweig/projects/DeepSpeed
collected 54 items
tests/unit/test_sparse_attention.py::test_sparse_attention_module_availability PASSED
tests/unit/test_sparse_attention.py::test_matmul_module_availability PASSED
tests/unit/test_sparse_attention.py::test_softmax_module_availability PASSED
tests/unit/test_sparse_attention.py::test_sparsityconfig_module_availability PASSED
tests/unit/test_sparse_attention.py::test_densesparsityconfig_module_availability PASSED
tests/unit/test_sparse_attention.py::test_fixedsparsityconfig_module_availability PASSED
tests/unit/test_sparse_attention.py::test_variablesparsityconfig_module_availability PASSED
tests/unit/test_sparse_attention.py::test_bigbirdsparsityconfig_module_availability PASSED
tests/unit/test_sparse_attention.py::test_bslongformersparsityconfig_module_availability PASSED
tests/unit/test_sparse_attention.py::test_sparseselfattention_module_availability PASSED
tests/unit/test_sparse_attention.py::test_bertsparseselfattention_module_availability PASSED
tests/unit/test_sparse_attention.py::test_sparseattentionutils_availability PASSED
tests/unit/test_sparse_attention.py::test_cpp_utils_availability Sparse Attention cpp_utils Module is not installed!
PASSED
tests/unit/test_sparse_attention.py::test_softmax[dtype0-256-16] PASSED
tests/unit/test_sparse_attention.py::test_softmax[dtype0-256-32] PASSED
tests/unit/test_sparse_attention.py::test_softmax[dtype0-576-16] PASSED
tests/unit/test_sparse_attention.py::test_softmax[dtype0-576-32] PASSED
tests/unit/test_sparse_attention.py::test_softmax[dtype1-256-16] PASSED
tests/unit/test_sparse_attention.py::test_softmax[dtype1-256-32] PASSED
tests/unit/test_sparse_attention.py::test_softmax[dtype1-576-16] PASSED
tests/unit/test_sparse_attention.py::test_softmax[dtype1-576-32] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype0-sdd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype1-sdd-False-True] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype2-sdd-True-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype3-sdd-True-True] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype4-dsd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype5-dsd-False-True] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype6-dsd-True-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype7-dsd-True-True] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype8-dds-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype9-dds-False-True] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype10-dds-True-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype11-dds-True-True] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype12-sdd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype13-sdd-False-True] FAILED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype14-sdd-True-False] FAILED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype15-sdd-True-True] FAILED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype16-dsd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype17-dsd-False-True] FAILED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype18-dsd-True-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype19-dsd-True-True] FAILED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype20-dds-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype21-dds-False-True] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype22-dds-True-False] FAILED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype23-dds-True-True] FAILED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype24-sdd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype25-dsd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[16-dtype26-dds-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[32-dtype27-sdd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[32-dtype28-dsd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[32-dtype29-dds-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[64-dtype30-sdd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[64-dtype31-dsd-False-False] PASSED
tests/unit/test_sparse_attention.py::test_matmul[64-dtype32-dds-False-False] PASSED
============================================================================================================ FAILURES =============================================================================================================_____________________________________________________________________________________________ test_matmul[16-dtype13-sdd-False-True] ______________________________________________________________________________________________
block = 16, dtype = torch.float32, mode = 'sdd', trans_a = False, trans_b = True
@pytest.mark.parametrize("block, dtype, mode, trans_a, trans_b", testdata)
def test_matmul(block, dtype, mode, trans_a, trans_b):
_skip_on_cuda_compatability()
Z = 3
H = 2
M = 128
N = 256
K = 192
rho = 0.5
x, w, dy, shape, layout = init_matmul_inputs(Z, H, M, N, K, rho, mode, trans_a, trans_b, block, dtype, layout=None)
ref_y, ref_dx, ref_dw = run_matmul_reference(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
st_y, st_dx, st_dw = run_matmul_sparse(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
assert allclose(ref_y, st_y)
assert allclose(ref_dx, st_dx)
> assert allclose(ref_dw, st_dw)
E AssertionError: assert False
E + where False = allclose(tensor([[[[10.6649, 8.8779, 13.0814, ..., 9.5998, 10.8293, 11.3977],\n [10.1814, 9.0207, 11.7046, ..., 10..., 7.6252, 6.5125],\n [ 6.6710, 6.8485, 8.2308, ..., 6.6604, 8.7260, 7.5133]]]],\n device='cuda:0'), tensor([[[[10.6649, 10.1814, 9.6523, ..., 12.4634, 10.5354, 12.3890],\n [ 8.8779, 9.0207, 7.7923, ..., 10..., 7.6252, 8.7260],\n [ 8.8778, 9.3911,
7.2760, ..., 9.9537, 6.5125, 7.5133]]]],\n device='cuda:0'))
tests/unit/test_sparse_attention.py:352: AssertionError
_____________________________________________________________________________________________ test_matmul[16-dtype14-sdd-True-False] ______________________________________________________________________________________________
block = 16, dtype = torch.float32, mode = 'sdd', trans_a = True, trans_b = False
@pytest.mark.parametrize("block, dtype, mode, trans_a, trans_b", testdata)
def test_matmul(block, dtype, mode, trans_a, trans_b):
_skip_on_cuda_compatability()
Z = 3
H = 2
M = 128
N = 256
K = 192
rho = 0.5
x, w, dy, shape, layout = init_matmul_inputs(Z, H, M, N, K, rho, mode, trans_a, trans_b, block, dtype, layout=None)
ref_y, ref_dx, ref_dw = run_matmul_reference(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
st_y, st_dx, st_dw = run_matmul_sparse(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
assert allclose(ref_y, st_y)
> assert allclose(ref_dx, st_dx)
E AssertionError: assert False
E + where False = allclose(tensor([[[[19.4000, 21.2085, 17.2626, ..., 29.4252, 29.0357, 30.3427],\n [19.4239, 21.6188, 17.5557, ..., 28..., 30.8725, 31.2582],\n [35.9650, 37.1245, 35.6984, ..., 27.4908, 28.2339, 28.2403]]]],\n device='cuda:0'), tensor([[[[19.4000, 19.4239, 18.3452, ..., 29.5145, 28.4773, 29.5899],\n [21.2085, 21.6188, 20.8695, ..., 28..., 30.8725, 28.2339],\n [35.4469, 35.0486, 40.4683, ..., 26.8261, 31.2582, 28.2403]]]],\n device='cuda:0'))
tests/unit/test_sparse_attention.py:351: AssertionError
______________________________________________________________________________________________ test_matmul[16-dtype15-sdd-True-True] ______________________________________________________________________________________________
block = 16, dtype = torch.float32, mode = 'sdd', trans_a = True, trans_b = True
@pytest.mark.parametrize("block, dtype, mode, trans_a, trans_b", testdata)
def test_matmul(block, dtype, mode, trans_a, trans_b):
_skip_on_cuda_compatability()
Z = 3
H = 2
M = 128
N = 256
K = 192
rho = 0.5
x, w, dy, shape, layout = init_matmul_inputs(Z, H, M, N, K, rho, mode, trans_a, trans_b, block, dtype, layout=None)
ref_y, ref_dx, ref_dw = run_matmul_reference(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
st_y, st_dx, st_dw = run_matmul_sparse(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
assert allclose(ref_y, st_y)
> assert allclose(ref_dx, st_dx)
E AssertionError: assert False
E + where False = allclose(tensor([[[[16.6002, 18.2478, 13.4669, ..., 25.9451, 25.6677, 26.0942],\n [20.7412, 23.0264, 18.4304, ..., 27..., 30.6817, 34.0804],\n [33.9480, 34.5858, 36.3433, ..., 29.5946, 27.6837, 31.4218]]]],\n device='cuda:0'), tensor([[[[16.6002, 20.7412, 18.8973, ..., 28.4696, 29.7352, 26.7173],\n [18.2478, 23.0264, 21.7679, ..., 27..., 30.6817, 27.6837],\n [33.7383, 37.8429, 36.3986, ..., 31.8884, 34.0804, 31.4218]]]],\n device='cuda:0'))
tests/unit/test_sparse_attention.py:351: AssertionError
_____________________________________________________________________________________________ test_matmul[16-dtype17-dsd-False-True] ______________________________________________________________________________________________
block = 16, dtype = torch.float32, mode = 'dsd', trans_a = False, trans_b = True
@pytest.mark.parametrize("block, dtype, mode, trans_a, trans_b", testdata)
def test_matmul(block, dtype, mode, trans_a, trans_b):
_skip_on_cuda_compatability()
Z = 3
H = 2
M = 128
N = 256
K = 192
rho = 0.5
x, w, dy, shape, layout = init_matmul_inputs(Z, H, M, N, K, rho, mode, trans_a, trans_b, block, dtype, layout=None)
ref_y, ref_dx, ref_dw = run_matmul_reference(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
st_y, st_dx, st_dw = run_matmul_sparse(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
assert allclose(ref_y, st_y)
assert allclose(ref_dx, st_dx)
> assert allclose(ref_dw, st_dw)
E AssertionError: assert False
E + where False = allclose(tensor([[[[18.8306, 15.4051, 20.5179, ..., 9.4702, 12.4456, 10.4211],\n [17.6555, 15.5962, 18.1549, ..., 11..., 18.2374, 16.3979],\n [18.7140, 17.7162, 16.2511, ..., 14.3712, 15.7780, 14.2453]]]],\n device='cuda:0'), tensor([[[[18.8306, 17.6555, 15.9134, ..., 14.4461, 11.5269, 11.8117],\n [15.4051, 15.5962, 13.7993, ..., 11..., 18.2374, 15.7780],\n [26.4535, 19.9620, 21.2492, ..., 17.2536, 16.3979, 14.2453]]]],\n device='cuda:0'))
tests/unit/test_sparse_attention.py:352: AssertionError
______________________________________________________________________________________________ test_matmul[16-dtype19-dsd-True-True] ______________________________________________________________________________________________
block = 16, dtype = torch.float32, mode = 'dsd', trans_a = True, trans_b = True
@pytest.mark.parametrize("block, dtype, mode, trans_a, trans_b", testdata)
def test_matmul(block, dtype, mode, trans_a, trans_b):
_skip_on_cuda_compatability()
Z = 3
H = 2
M = 128
N = 256
K = 192
rho = 0.5
x, w, dy, shape, layout = init_matmul_inputs(Z, H, M, N, K, rho, mode, trans_a, trans_b, block, dtype, layout=None)
ref_y, ref_dx, ref_dw = run_matmul_reference(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
st_y, st_dx, st_dw = run_matmul_sparse(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
assert allclose(ref_y, st_y)
assert allclose(ref_dx, st_dx)
> assert allclose(ref_dw, st_dw)
E AssertionError: assert False
E + where False = allclose(tensor([[[[ 6.1987, 6.6541, 6.1035, ..., 17.3724, 18.4083, 22.4095],\n [ 6.6077, 6.5493, 6.6188, ..., 18..., 8.3150, 8.6550],\n [17.1273, 18.5145, 16.2477, ..., 7.4840, 7.8811, 7.5407]]]],\n device='cuda:0'), tensor([[[[ 6.1987, 6.6077, 5.4255, ..., 20.8259, 19.6450, 19.2312],\n [ 6.6541, 6.5493, 6.0281, ..., 18..., 8.3150, 7.8811],\n [25.5614, 22.5099, 23.1032, ..., 7.8368, 8.6550, 7.5407]]]],\n device='cuda:0'))
tests/unit/test_sparse_attention.py:352: AssertionError
_____________________________________________________________________________________________ test_matmul[16-dtype22-dds-True-False] ______________________________________________________________________________________________
block = 16, dtype = torch.float32, mode = 'dds', trans_a = True, trans_b = False
@pytest.mark.parametrize("block, dtype, mode, trans_a, trans_b", testdata)
def test_matmul(block, dtype, mode, trans_a, trans_b):
_skip_on_cuda_compatability()
Z = 3
H = 2
M = 128
N = 256
K = 192
rho = 0.5
x, w, dy, shape, layout = init_matmul_inputs(Z, H, M, N, K, rho, mode, trans_a, trans_b, block, dtype, layout=None)
ref_y, ref_dx, ref_dw = run_matmul_reference(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
st_y, st_dx, st_dw = run_matmul_sparse(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
assert allclose(ref_y, st_y)
> assert allclose(ref_dx, st_dx)
E AssertionError: assert False
E + where False = allclose(tensor([[[[19.4000, 21.2085, 17.2626, ..., 22.0433, 19.4535, 21.8816],\n [19.4239, 21.6188, 17.5557, ..., 23..., 17.6377, 18.5949],\n [14.9210, 16.5670, 16.2262, ..., 16.1409, 16.6919, 17.3924]]]],\n device='cuda:0'), tensor([[[[19.4000, 19.4239, 18.3452, ..., 21.3079, 20.1844, 22.4448],\n [21.2085, 21.6188, 20.8695, ..., 23..., 17.6377, 16.6919],\n [16.9149, 17.7246, 17.9125, ..., 17.7202, 18.5949, 17.3924]]]],\n device='cuda:0'))
tests/unit/test_sparse_attention.py:351: AssertionError
______________________________________________________________________________________________ test_matmul[16-dtype23-dds-True-True] ______________________________________________________________________________________________
block = 16, dtype = torch.float32, mode = 'dds', trans_a = True, trans_b = True
@pytest.mark.parametrize("block, dtype, mode, trans_a, trans_b", testdata)
def test_matmul(block, dtype, mode, trans_a, trans_b):
_skip_on_cuda_compatability()
Z = 3
H = 2
M = 128
N = 256
K = 192
rho = 0.5
x, w, dy, shape, layout = init_matmul_inputs(Z, H, M, N, K, rho, mode, trans_a, trans_b, block, dtype, layout=None)
ref_y, ref_dx, ref_dw = run_matmul_reference(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
st_y, st_dx, st_dw = run_matmul_sparse(x.clone(), w.clone(), mode, trans_a, trans_b, layout, block, dy)
assert allclose(ref_y, st_y)
> assert allclose(ref_dx, st_dx)
E AssertionError: assert False
E + where False = allclose(tensor([[[[38.8488, 40.6006, 39.4168, ..., 37.1583, 37.8707, 39.2315],\n [38.4693, 42.6132, 39.3447, ..., 39..., 51.2968, 50.1901],\n [47.7016, 47.3861, 48.2378, ..., 45.5591, 48.6820, 49.2167]]]],\n device='cuda:0'), tensor([[[[38.8488, 38.4693, 41.7933, ..., 38.4504, 42.3083, 36.1750],\n [40.6006, 42.6132, 44.5984, ..., 39..., 51.2968, 48.6820],\n [42.7509, 45.4472, 45.4716, ..., 48.8592, 50.1901, 49.2167]]]],\n device='cuda:0'))
tests/unit/test_sparse_attention.py:351: AssertionError
======================================================================================================== warnings summary =========================================================================================================tests/unit/test_sparse_attention.py::test_softmax[dtype0-256-16]
/private/home/songweig/projects/DeepSpeed/tests/unit/test_sparse_attention.py:138: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at /opt/conda/conda-bld/pytorch_1607370172916/work/torch/csrc/utils/python_arg_parser.cpp:882.)
nnz = mask.nonzero()
-- Docs: https://docs.pytest.org/en/stable/warnings.html
===================================================================================================== short test summary info =====================================================================================================FAILED tests/unit/test_sparse_attention.py::test_matmul[16-dtype13-sdd-False-True] - AssertionError: assert False
FAILED tests/unit/test_sparse_attention.py::test_matmul[16-dtype14-sdd-True-False] - AssertionError: assert False
FAILED tests/unit/test_sparse_attention.py::test_matmul[16-dtype15-sdd-True-True] - AssertionError: assert False
FAILED tests/unit/test_sparse_attention.py::test_matmul[16-dtype17-dsd-False-True] - AssertionError: assert False
FAILED tests/unit/test_sparse_attention.py::test_matmul[16-dtype19-dsd-True-True] - AssertionError: assert False
FAILED tests/unit/test_sparse_attention.py::test_matmul[16-dtype22-dds-True-False] - AssertionError: assert False
FAILED tests/unit/test_sparse_attention.py::test_matmul[16-dtype23-dds-True-True] - AssertionError: assert False
============================================================================================ 7 failed, 47 passed, 1 warning in 50.33s =============================================================================================
In addition, here is my ds_report
output for your reference:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [YES] ...... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/private/home/songweig/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch']
torch version .................... 1.7.1
torch cuda version ............... 11.0
nvcc version ..................... 11.0
deepspeed install path ........... ['/private/home/songweig/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.4.2, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.7, cuda 11.0
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Conflict with cudatoolkit 11.0.221 · Issue #51080 - GitHub
If cudatoolkit 11.0.221 is installed, python -c 'import torch' fails ... create -n fail -c pytorch -c conda-forge pytorch cudatoolkit=11.0.
Read more >CUDA Compatibility :: NVIDIA Data Center GPU Driver ...
CUDA Compatibility document describes the use of new CUDA toolkit components on systems with older base installations.
Read more >CUDA and CuDNN version conflict against Tensorflow2.4.1
I tried so many patterns with fresh installation of Ubuntu. Once I failed to install ver.450 with DPKG error, but I will try...
Read more >2D Classification PyCuda Error (CUDA 11) - Troubleshooting
Thank you guys! [CPU: 1.27 GB] Traceback (most recent call last): File "cryosparc2_compute/jobs/runcommon.py", line 1685, in ...
Read more >Problem while installing cuda toolkit in ubuntu 18.04
05 and the installation actually fails there. I've solved it by first downloading the local installer and unselecting the CUDA driver ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
https://github.com/microsoft/DeepSpeed/issues/1348 refer this!
Hi Reza, the code I used was not test but from the mentioned repo above.
I was able to get things to work with an older version of deepspeed and triton:
deepspeed==0.3.1
andtriton==0.2.3
. I’m wondering if you have any thoughts on what have been changed in with the latest version to cause the error?Thanks, Songwei