Error during training for custom dataset
See original GitHub issueWhen trying to train the model by the command below, a RuntimeError occurred, it seems that some problems with the GPUs (four GPU).
command I run
the command I run:
python train.py --gpus 0,1,2,3 --cfg $cfg
Error:
[2019-10-06 08:56:13,423 INFO train.py line 246 3390] Outputing checkpoints to: ckpt/test-resnet50dilated-ppm_deepsup
# samples: 7296
1 Epoch = 5000 iters
Traceback (most recent call last):
File "train.py", line 273, in <module>
main(cfg, gpus)
File "train.py", line 200, in main
train(segmentation_module, iterator_train, optimizers, history, epoch+1, cfg)
File "train.py", line 32, in train
batch_data = next(iterator)
File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in __next__
return self._process_next_batch(batch)
File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
AssertionError: Traceback (most recent call last):
File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/bruno/xView2/semantic-segmentation-pytorch/dataset.py", line 162, in __getitem__
assert(segm.mode == "L")
AssertionError
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (2 by maintainers)
Top Results From Across the Web
Error when Training on Custom dataset #94 - GitHub
Hi, Thanks for nice work, I tried to use it in my dataset, and follow all instructions on how to train the model...
Read more >Fixing training errors - Rekognition - AWS Documentation
Fixing training errors · Download the validation results files. · Open the manifest summary file (manifest_summary. · Fix any errors in the manifest...
Read more >Error while training custom dataset for StyleGan..
So I made 224 square images to test StyleGan, but I am getting a lot of errors on the training part and not...
Read more >Step-by-step instructions for training YOLOv7 on a Custom ...
Follow this guide to get step-by-step instructions for running YOLOv7 model training within a Gradient Notebook on a custom dataset.
Read more >Error while training custom dataset for StyleGan.. - Reddit
So I made 224 square images to test StyleGan, but I am getting a lot of errors on the training part and not...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@bao18 how did you change the config options to solve this problem?
@bao18 out of curiosity , was your 0 label background, or a class you a specific object?