NCCL version
See original GitHub issueHi
i have installed environment in the yaml file and installed torch 1.8 follow the setting in readme
my cuda version is 11.4, it seems that it is a version conflict of NCCL, pytorch and cuda
Is my cuda version to high?
ssh://xh@210.28.134.34:22/home2/xh/.conda/envs/skg/bin/python -u -m torch.distributed.launch --nproc_per_node 4 --master_port 1234 train.py --seed 2 --cfg Salesforce/T5_base_prefix_compwebq.cfg --run_name T5_base_prefix_compwebq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 2 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_prefix_compwebq --overwrite_output_dir --per_device_train_batch_size 2 --per_device_eval_batch_size 4 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/__init__.py:422: UserWarning: torch.set_deterministic is deprecated and will be removed in a future release. Please use torch.use_deterministic_algorithms instead
"torch.set_deterministic is deprecated and will be removed in a future "
[W Context.cpp:70] Warning: torch.use_deterministic_algorithms is in beta, and its design and functionality may change in the future. (function operator())
INFO:filelock:Lock 140211123887800 acquired on .lock
/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/__init__.py:422: UserWarning: torch.set_deterministic is deprecated and will be removed in a future release. Please use torch.use_deterministic_algorithms instead
"torch.set_deterministic is deprecated and will be removed in a future "
[W Context.cpp:70] Warning: torch.use_deterministic_algorithms is in beta, and its design and functionality may change in the future. (function operator())
/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/__init__.py:422: UserWarning: torch.set_deterministic is deprecated and will be removed in a future release. Please use torch.use_deterministic_algorithms instead
"torch.set_deterministic is deprecated and will be removed in a future "
[W Context.cpp:70] Warning: torch.use_deterministic_algorithms is in beta, and its design and functionality may change in the future. (function operator())
/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/__init__.py:422: UserWarning: torch.set_deterministic is deprecated and will be removed in a future release. Please use torch.use_deterministic_algorithms instead
"torch.set_deterministic is deprecated and will be removed in a future "
[W Context.cpp:70] Warning: torch.use_deterministic_algorithms is in beta, and its design and functionality may change in the future. (function operator())
INFO:filelock:Lock 140211123887800 released on .lock
INFO:filelock:Lock 140144150953768 acquired on .lock
INFO:filelock:Lock 140144150953768 released on .lock
INFO:filelock:Lock 139898741587640 acquired on .lock
Traceback (most recent call last):
File "train.py", line 225, in <module>
main()
File "train.py", line 41, in main
training_args, = parser.parse_args_into_dataclasses()
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/hf_argparser.py", line 191, in parse_args_into_dataclasses
obj = dtype(**inputs)
File "<string>", line 83, in __init__
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 702, in __post_init__
if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
return func(*args, **kwargs)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 873, in device
return self._setup_devices
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1717, in __get__
cached = self.fget(obj)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
return func(*args, **kwargs)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 858, in _setup_devices
torch.distributed.init_process_group(backend="nccl")
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 510, in init_process_group
timeout=timeout))
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 603, in _new_process_group_helper
timeout)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
INFO:filelock:Lock 139898741587640 released on .lock
INFO:filelock:Lock 139711655354096 acquired on .lock
Traceback (most recent call last):
File "train.py", line 225, in <module>
main()
File "train.py", line 41, in main
training_args, = parser.parse_args_into_dataclasses()
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/hf_argparser.py", line 191, in parse_args_into_dataclasses
obj = dtype(**inputs)
File "<string>", line 83, in __init__
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 702, in __post_init__
if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
return func(*args, **kwargs)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 873, in device
return self._setup_devices
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1717, in __get__
cached = self.fget(obj)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
return func(*args, **kwargs)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 858, in _setup_devices
torch.distributed.init_process_group(backend="nccl")
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 510, in init_process_group
timeout=timeout))
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 603, in _new_process_group_helper
timeout)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
INFO:filelock:Lock 139711655354096 released on .lock
Traceback (most recent call last):
File "train.py", line 225, in <module>
main()
File "train.py", line 41, in main
training_args, = parser.parse_args_into_dataclasses()
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/hf_argparser.py", line 191, in parse_args_into_dataclasses
obj = dtype(**inputs)
File "<string>", line 83, in __init__
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 702, in __post_init__
if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
return func(*args, **kwargs)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 873, in device
return self._setup_devices
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1717, in __get__
cached = self.fget(obj)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
return func(*args, **kwargs)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 858, in _setup_devices
torch.distributed.init_process_group(backend="nccl")
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 510, in init_process_group
timeout=timeout))
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 603, in _new_process_group_helper
timeout)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
Traceback (most recent call last):
File "train.py", line 225, in <module>
main()
File "train.py", line 41, in main
training_args, = parser.parse_args_into_dataclasses()
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/hf_argparser.py", line 191, in parse_args_into_dataclasses
obj = dtype(**inputs)
File "<string>", line 83, in __init__
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 702, in __post_init__
if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
return func(*args, **kwargs)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 873, in device
return self._setup_devices
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1717, in __get__
cached = self.fget(obj)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
return func(*args, **kwargs)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 858, in _setup_devices
torch.distributed.init_process_group(backend="nccl")
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 510, in init_process_group
timeout=timeout))
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 603, in _new_process_group_helper
timeout)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
Killing subprocess 30316
Killing subprocess 30320
Killing subprocess 30321
Killing subprocess 30322
Traceback (most recent call last):
File "/home2/xh/.conda/envs/skg/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home2/xh/.conda/envs/skg/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 340, in <module>
main()
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home2/xh/.conda/envs/skg/bin/python', '-u', 'train.py', '--local_rank=3', '--seed', '2', '--cfg', 'Salesforce/T5_base_prefix_compwebq.cfg', '--run_name', 'T5_base_prefix_compwebq', '--logging_strategy', 'steps', '--logging_first_step', 'true', '--logging_steps', '4', '--evaluation_strategy', 'steps', '--eval_steps', '500', '--metric_for_best_model', 'avr', '--greater_is_better', 'true', '--save_strategy', 'steps', '--save_steps', '500', '--save_total_limit', '1', '--load_best_model_at_end', '--gradient_accumulation_steps', '2', '--num_train_epochs', '400', '--adafactor', 'true', '--learning_rate', '5e-5', '--do_train', '--do_eval', '--do_predict', '--predict_with_generate', '--output_dir', 'output/T5_base_prefix_compwebq', '--overwrite_output_dir', '--per_device_train_batch_size', '2', '--per_device_eval_batch_size', '4', '--generation_num_beams', '4', '--generation_max_length', '128', '--input_max_length', '1024', '--ddp_find_unused_parameters', 'true']' returned non-zero exit status 1.
Process finished with exit code 1
I also tried torch 1.11+ cu113,got another error
(skg) xh@4210GPU:~/PycharmProject/UnifiedSKG$ python -m torch.distributed.launch --nproc_per_node 4 --master_port 1234 train.py --seed 2 --cfg Salesforce/T5_base_finetune_compwebq.cfg --run_name T5_base_finetune_compwebq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 2 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_finetune_compwebq --overwrite_output_dir --per_device_train_batch_size 1 --per_device_eval_batch_size 4 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true
/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
FutureWarning,
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
Traceback (most recent call last):
File "train.py", line 225, in <module>
main()
File "train.py", line 29, in main
torch.set_deterministic(True)
AttributeError: module 'torch' has no attribute 'set_deterministic'
Traceback (most recent call last):
File "train.py", line 225, in <module>
main()
File "train.py", line 29, in main
torch.set_deterministic(True)
AttributeError: module 'torch' has no attribute 'set_deterministic'
Traceback (most recent call last):
File "train.py", line 225, in <module>
main()
File "train.py", line 29, in main
torch.set_deterministic(True)
AttributeError: module 'torch' has no attribute 'set_deterministic'
Traceback (most recent call last):
File "train.py", line 225, in <module>
main()
File "train.py", line 29, in main
torch.set_deterministic(True)
AttributeError: module 'torch' has no attribute 'set_deterministic'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 17913) of binary: /home2/xh/.conda/envs/skg/bin/python
Traceback (most recent call last):
File "/home2/xh/.conda/envs/skg/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home2/xh/.conda/envs/skg/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/run.py", line 718, in run
)(*cmd_args)
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2022-03-14_21:36:50
host : 4210GPU
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 17914)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2022-03-14_21:36:50
host : 4210GPU
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 17915)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2022-03-14_21:36:50
host : 4210GPU
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 17916)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2022-03-14_21:36:50
host : 4210GPU
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 17913)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Looking forward to your reply. Thank you.
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (5 by maintainers)
Top Results From Across the Web
Release Notes :: NVIDIA Deep Learning NCCL Documentation
NCCL Release Notes. This document describes the key features, software enhancements and improvements, and known issues for NCCL 2.16.2.
Read more >How to check the version of NCCL - Stack Overflow
you can do python -c "import torch;print(torch.cuda.nccl.version())" with pytorch. I wish I new the terminal command without pytorch.
Read more >NVIDIA/nccl: Optimized primitives for collective multi ... - GitHub
NCCL (pronounced "Nickel") is a stand-alone library of standard communication routines for GPUs, implementing all-reduce, all-gather, reduce, broadcast, reduce- ...
Read more >Nccl - :: Anaconda.org
Description. The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are ...
Read more >Horovod on GPU
In most situations, using NCCL 2 will significantly improve performance over the CPU version. NCCL 2 provides the allreduce operation optimized for NVIDIA ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I think it is a PyTorch version issue. My personal experience is that removing the following lines works for other PyTorch versions. https://github.com/HKUNLP/UnifiedSKG/blob/fab45fea3a349c9dbda4ed34482df227920272db/train.py#L27-L29
Indeed, we met the similar situation during our experiments on some machines(we used a lot of GPUs on different kinds of HPC). I remember we fixed that issue by installing proper PyTorch for your CUDA version.