question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cuda 11 cannot be supported

See original GitHub issue

I wanted to run Deepdpeed on RTX 3090, cudA 11 only on 3090, and in your docker release I updated Pytorch version to 1.7.0 and ran an error:

building GPT2 model ...
Traceback (most recent call last):
  File "pretrain_gpt2.py", line 711, in <module>
    main()
  File "pretrain_gpt2.py", line 659, in main
    model, optimizer, lr_scheduler = setup_model_and_optimizer(args)
  File "pretrain_gpt2.py", line 158, in setup_model_and_optimizer
    model = get_model(args)
  File "pretrain_gpt2.py", line 69, in get_model
    parallel_output=True)
  File "/data/Megatron-LM/model/gpt2_modeling.py", line 81, in __init__
    checkpoint_num_layers)
  File "/data/Megatron-LM/mpu/transformer.py", line 384, in __init__
    [get_layer() for _ in range(num_layers)])
  File "/data/Megatron-LM/mpu/transformer.py", line 384, in <listcomp>
    [get_layer() for _ in range(num_layers)])
  File "/data/Megatron-LM/mpu/transformer.py", line 380, in get_layer
    output_layer_init_method=output_layer_init_method)
  File "/data/Megatron-LM/mpu/transformer.py", line 259, in __init__
    self.input_layernorm = LayerNorm(hidden_size, eps=layernorm_epsilon)
  File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 133, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E

how can I use it in 3090?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:15 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
hujian233commented, Dec 3, 2020

@jeffra ,I am excited. Yeah, it worked very well in th RTX3090 with cuda11.0,pytorch1.8.0. or pytorch1.7.0 This question can be closed. Thanks again.

0reactions
hujian233commented, Dec 3, 2020

@jeffra ,Hi,sounds great, I had tried the same thing last night using the cuda 11.1 and torch 1.8.0 without version check. It compiled successs. I will try the latest code, thank you very much.

Read more comments on GitHub >

github_iconTop Results From Across the Web

CUDA Compatibility :: NVIDIA Data Center GPU Driver ...
CUDA Compatibility document describes the use of new CUDA toolkit components on systems with older base installations.
Read more >
NVIDIA CUDA Installation Guide for Microsoft Windows
System Requirements. To use CUDA on your system, you will need the following installed: ‣ A CUDA-capable GPU. ‣ A supported version of...
Read more >
Problem while installing cuda toolkit in ubuntu 18.04
I just ran into this issue and solved it by running the following commands: sudo apt clean sudo apt update sudo apt purge...
Read more >
WSL with CUDA support - Hacker News
21H2 cannot currently be updated to (yeah, it's because they are pushing Windows 11) so you need to download it as an ISO....
Read more >
Different CUDA versions shown by nvcc and NVIDIA-smi
You have one of the recent 410.x drivers installed which support CUDA 10. The version the driver supports has nothing to do with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found