Attempted to restart training on a COCO dataset after 2 epochs... failed with runtime error
See original GitHub issuewas training COCO… everything went smoothly and I managed to get into 3rd epoch and paused the training. Went to restart, and got errors. Unsure how to proceed with debugging or cleaning up…
CUDA_VISIBLE_DEVICES=0 python trainval_net.py --dataset coco --net res101 --bs 1 --nw 1 --lr .001 --lr_decay_step 10 --cuda --r true --checksession 1 --checkepoch 2 --checkpoint 234531 --use_tfb
234532 roidb entries
Loading pretrained weights from data/pretrained_model/resnet101_caffe.pth
loading checkpoint models/res101/coco/faster_rcnn_1_2_234531.pth
loaded checkpoint models/res101/coco/faster_rcnn_1_2_234531.pth
Traceback (most recent call last):
File "trainval_net.py", line 339, in <module>
optimizer.step()
File "/python3.6/site-packages/torch/optim/sgd.py", line 101, in step
buf.mul_(momentum).add_(1 - dampening, d_p)
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
(pytorch1_py36) emcp@k:faster-rcnn.pytorch$
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
Changelog — MMDetection 2.26.0 documentation
Support splitting COCO data for Semi-supervised object detection (#7431) ... Fix two-stage runtime error given empty proposal (#5559).
Read more >Training Tensorflow2 model : Failed to find any matching files ...
How can I fix this error? FileNotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /home/UbuntuUser/ ...
Read more >Multi-GPU Training - YOLOv5 Documentation
Multi-GPU Training. This guide explains how to properly use multiple GPUs to train a dataset with YOLOv5 on single or multiple machine(s).
Read more >runtimeerror: cuda error: cublas_status_not_initialized - You.com ...
When I run the forward method, I got the issue 'RuntimeError: CUDA error: ... workers Logging results to runs\m6-c3cbam\exp7 Starting training for 1...
Read more >Troubleshooting TensorFlow - TPU - Google Cloud
The TPU runtime attempts to optimize operators to fit the model in memory ... this is to start with 1024, and if this...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I ran into a similar problem. For me that issue was solved when moving the lines:
above the assignment of the optimizer, i.e. above:
According to the documentation it is best practice to move the model to GPU prior to initialization/assignment of the optimizer.
@john2020-210 https://github.com/jwyang/faster-rcnn.pytorch/tree/pytorch-1.0