Training on Single GPU
See original GitHub issueThanks for the exciting work.
I am trying to finetune on my classification (imagenet) like dataset on 1 GPU using following command.
python -m torch.distributed.launch --nproc_per_node=1 --use_env main.py --model xcit_nano_12_p16 --batch-size 16 --drop-path 0.05 --output_dir experiments/xcit_nano_12_p16/ --epochs 30 --pretrained /mnt/hdd1/Projects/XCiT/xcit_nano_12_p16_224.pth
But it fails with following error: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 1, 128]], which is output 0 of SliceBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
what could be done to resolve this? I am new to distributed training .
Issue Analytics
- State:
- Created 2 years ago
- Comments:5
Top GitHub Comments
I found the culprit (I was way off before). A workaround is to set
tokens_norm=True
(here for example). Going by the comments, this will hurt your performance if you’re just doing inference with a pretrained xcit_nano.Solution by @dwhite54 above works , so closing.