question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error during training for custom dataset

See original GitHub issue

When trying to train the model by the command below, a RuntimeError occurred, it seems that some problems with the GPUs (four GPU).

command I run

the command I run:

python train.py --gpus 0,1,2,3 --cfg $cfg

Error:

[2019-10-06 08:56:13,423 INFO train.py line 246 3390] Outputing checkpoints to: ckpt/test-resnet50dilated-ppm_deepsup
# samples: 7296
1 Epoch = 5000 iters
Traceback (most recent call last):
  File "train.py", line 273, in <module>
    main(cfg, gpus)
  File "train.py", line 200, in main
    train(segmentation_module, iterator_train, optimizers, history, epoch+1, cfg)
  File "train.py", line 32, in train
    batch_data = next(iterator)
  File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in __next__
    return self._process_next_batch(batch)
  File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
AssertionError: Traceback (most recent call last):
  File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/bruno/xView2/semantic-segmentation-pytorch/dataset.py", line 162, in __getitem__
    assert(segm.mode == "L")
AssertionError

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
mdt48commented, Jan 22, 2020

@bao18 how did you change the config options to solve this problem?

1reaction
DecentMakeovercommented, Nov 9, 2019

@bao18 out of curiosity , was your 0 label background, or a class you a specific object?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error when Training on Custom dataset #94 - GitHub
Hi, Thanks for nice work, I tried to use it in my dataset, and follow all instructions on how to train the model...
Read more >
Fixing training errors - Rekognition - AWS Documentation
Fixing training errors · Download the validation results files. · Open the manifest summary file (manifest_summary. · Fix any errors in the manifest...
Read more >
Error while training custom dataset for StyleGan..
So I made 224 square images to test StyleGan, but I am getting a lot of errors on the training part and not...
Read more >
Step-by-step instructions for training YOLOv7 on a Custom ...
Follow this guide to get step-by-step instructions for running YOLOv7 model training within a Gradient Notebook on a custom dataset.
Read more >
Error while training custom dataset for StyleGan.. - Reddit
So I made 224 square images to test StyleGan, but I am getting a lot of errors on the training part and not...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found