Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error during training for custom dataset

See original GitHub issue

When trying to train the model by the command below, a RuntimeError occurred, it seems that some problems with the GPUs (four GPU).

command I run

the command I run:

python train.py --gpus 0,1,2,3 --cfg $cfg

Error:

[2019-10-06 08:56:13,423 INFO train.py line 246 3390] Outputing checkpoints to: ckpt/test-resnet50dilated-ppm_deepsup
# samples: 7296
1 Epoch = 5000 iters
Traceback (most recent call last):
  File "train.py", line 273, in <module>
    main(cfg, gpus)
  File "train.py", line 200, in main
    train(segmentation_module, iterator_train, optimizers, history, epoch+1, cfg)
  File "train.py", line 32, in train
    batch_data = next(iterator)
  File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in __next__
    return self._process_next_batch(batch)
  File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
AssertionError: Traceback (most recent call last):
  File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/bruno/xView2/semantic-segmentation-pytorch/dataset.py", line 162, in __getitem__
    assert(segm.mode == "L")
AssertionError

Issue Analytics

State:
Created 4 years ago
Comments:9 (2 by maintainers)

Top GitHub Comments

1reaction

mdt48commented, Jan 22, 2020

@bao18 how did you change the config options to solve this problem?

1reaction

DecentMakeovercommented, Nov 9, 2019

@bao18 out of curiosity , was your 0 label background, or a class you a specific object?

Top Results From Across the Web

Error when Training on Custom dataset #94 - GitHub

Hi, Thanks for nice work, I tried to use it in my dataset, and follow all instructions on how to train the model...

Fixing training errors - Rekognition - AWS Documentation

Fixing training errors · Download the validation results files. · Open the manifest summary file (manifest_summary. · Fix any errors in the manifest...

Error while training custom dataset for StyleGan..

So I made 224 square images to test StyleGan, but I am getting a lot of errors on the training part and not...

Step-by-step instructions for training YOLOv7 on a Custom ...

Follow this guide to get step-by-step instructions for running YOLOv7 model training within a Gradient Notebook on a custom dataset.

Error while training custom dataset for StyleGan.. - Reddit

So I made 224 square images to test StyleGan, but I am getting a lot of errors on the training part and not...