Cuda 11 cannot be supported
See original GitHub issueI wanted to run Deepdpeed on RTX 3090, cudA 11 only on 3090, and in your docker release I updated Pytorch version to 1.7.0 and ran an error:
building GPT2 model ...
Traceback (most recent call last):
File "pretrain_gpt2.py", line 711, in <module>
main()
File "pretrain_gpt2.py", line 659, in main
model, optimizer, lr_scheduler = setup_model_and_optimizer(args)
File "pretrain_gpt2.py", line 158, in setup_model_and_optimizer
model = get_model(args)
File "pretrain_gpt2.py", line 69, in get_model
parallel_output=True)
File "/data/Megatron-LM/model/gpt2_modeling.py", line 81, in __init__
checkpoint_num_layers)
File "/data/Megatron-LM/mpu/transformer.py", line 384, in __init__
[get_layer() for _ in range(num_layers)])
File "/data/Megatron-LM/mpu/transformer.py", line 384, in <listcomp>
[get_layer() for _ in range(num_layers)])
File "/data/Megatron-LM/mpu/transformer.py", line 380, in get_layer
output_layer_init_method=output_layer_init_method)
File "/data/Megatron-LM/mpu/transformer.py", line 259, in __init__
self.input_layernorm = LayerNorm(hidden_size, eps=layernorm_epsilon)
File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 133, in __init__
fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
File "<frozen importlib._bootstrap>", line 571, in module_from_spec
File "<frozen importlib._bootstrap_external>", line 922, in create_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E
how can I use it in 3090?
Issue Analytics
- State:
- Created 3 years ago
- Comments:15 (7 by maintainers)
Top Results From Across the Web
CUDA Compatibility :: NVIDIA Data Center GPU Driver ...
CUDA Compatibility document describes the use of new CUDA toolkit components on systems with older base installations.
Read more >NVIDIA CUDA Installation Guide for Microsoft Windows
System Requirements. To use CUDA on your system, you will need the following installed: ‣ A CUDA-capable GPU. ‣ A supported version of...
Read more >Problem while installing cuda toolkit in ubuntu 18.04
I just ran into this issue and solved it by running the following commands: sudo apt clean sudo apt update sudo apt purge...
Read more >WSL with CUDA support - Hacker News
21H2 cannot currently be updated to (yeah, it's because they are pushing Windows 11) so you need to download it as an ISO....
Read more >Different CUDA versions shown by nvcc and NVIDIA-smi
You have one of the recent 410.x drivers installed which support CUDA 10. The version the driver supports has nothing to do with...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@jeffra ,I am excited. Yeah, it worked very well in th RTX3090 with cuda11.0,pytorch1.8.0. or pytorch1.7.0 This question can be closed. Thanks again.
@jeffra ,Hi,sounds great, I had tried the same thing last night using the cuda 11.1 and torch 1.8.0 without version check. It compiled successs. I will try the latest code, thank you very much.