cli: Confused on (str, int, List[int]) variants for argparse for --gpus flag?
See original GitHub issue🐛 Bug
A colleague (@siyuanfeng-tri) and I sometimes get confused on how the --gpus
flag is to be interpreted by argparse
. I see the following docs:
https://pytorch-lightning.readthedocs.io/en/1.2.1/advanced/multi_gpu.html#select-gpu-devices
But we’re sometimes confused about when argparse
interpretation will either assume it’s the count of the gpu (int/str) or the device index (List[int]).
Are there docs for this? If not, can that be clarified somehow?
Please reproduce using the BoringModel
see notebook
The main complaint is that gpus=3
implies gpus=[3]
, while gpus="3"
implies gpus=[0,1,2]
.
Mix that with implicit conversion from argparse
from str
to int
, and you get a kinda weird public interface.
To Reproduce
example notebook: https://colab.research.google.com/drive/1pe9_F2S73-gQ3hOeh_MMiGhmbXmGURDQ?usp=sharing
Expected behavior
Less confusing / more explicit options? (maybe my complaint is with weird implicit behavior of Trainer(gpus=...)
?)
Environment
- PyTorch Version (e.g., 1.7.1):
- OS: Ubuntu 18.04
- How you installed PyTorch:
pip
- Python version: 3.6.9
- CUDA/cuDNN version: N/A
- GPU models and configuration: N/A
- Any other relevant information: N/A
Additional context
N/A
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (12 by maintainers)
i vote for making it consistent!
vote for @awaelchli proposal - got some confusion in that too some time ago.