question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error when training htc_x101_64x4d_fpn_20e_16 model on a Custom Dataset

See original GitHub issue

Describe the bug I tried training the htc_x101_64x4d_fpn_20e_16gpu model on a custom dataset. I set the ‘seg_prefix’ location to the folder that contains my segmentation maps. But soon after I start the training, it gives me the error: RuntimeError: 1only batches of spatial targets supported (non-empty 3D tensors) but got targets of size: : [1, 100, 100, 3] Also, can you please tell me what is the difference between htc without semantic and htc with semantic?

Reproduction

  1. What command or script did you run?
python tools/train.py ~/Prateek/Prateek/mmdetection2/mmdetection/configs/htc/htc_x101_64x4d_fpn_20e_16gpu.py
  1. Did you make any modifications on the code or config? Did you understand what you have modified? I modified the num_classes according to the custom dataset. I’m not sure what value of num_classes should I set in ‘semantic_head’

Environment

sys.platform: linux Python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0] CUDA available: True CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.1, V10.1.168 GPU 0,1: GeForce RTX 2080 Ti GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 PyTorch: 1.4.0 PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • Intel® Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel® 64 architecture applications
  • Intel® MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.1
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0 OpenCV: 4.1.2 MMCV: 0.2.16 MMDetection: 1.0rc1+4b984a7 MMDetection Compiler: GCC 5.4 MMDetection CUDA Compiler: 10.1

Error traceback

2020-01-26 15:34:51,233 - INFO - workflow: [('train', 1)], max: 25 epochs

Traceback (most recent call last):
  File "tools/train.py", line 124, in <module>
    main()
  File "tools/train.py", line 120, in main
    timestamp=timestamp)
  File "/home/user4/Prateek/Prateek/mmdetection2/mmdetection/mmdet/apis/train.py", line 133, in train_detector
    timestamp=timestamp)
  File "/home/user4/Prateek/Prateek/mmdetection2/mmdetection/mmdet/apis/train.py", line 319, in _non_dist_train
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/runner.py", line 364, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/runner.py", line 268, in train
    self.model, data_batch, train_mode=True, **kwargs)
  File "/home/user4/Prateek/Prateek/mmdetection2/mmdetection/mmdet/apis/train.py", line 100, in batch_processor
    losses = model(**data)
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user4/Prateek/Prateek/mmdetection2/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/user4/Prateek/Prateek/mmdetection2/mmdetection/mmdet/models/detectors/base.py", line 138, in forward
    return self.forward_train(img, img_meta, **kwargs)
  File "/home/user4/Prateek/Prateek/mmdetection2/mmdetection/mmdet/models/detectors/htc.py", line 230, in forward_train
    loss_seg = self.semantic_head.loss(semantic_pred, gt_semantic_seg)
  File "/home/user4/Prateek/Prateek/mmdetection2/mmdetection/mmdet/core/fp16/decorators.py", line 127, in new_func
    return old_func(*args, **kwargs)
  File "/home/user4/Prateek/Prateek/mmdetection2/mmdetection/mmdet/models/mask_heads/fused_semantic_head.py", line 108, in loss
    loss_semantic_seg = self.criterion(mask_pred, labels)
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 916, in forward
    ignore_index=self.ignore_index, reduction=self.reduction)
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py", line 2021, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py", line 1840, in nll_loss
    ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: 1only batches of spatial targets supported (non-empty 3D tensors) but got targets of size: : [1, 100, 100, 3]

Thanks for the help!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7

github_iconTop GitHub Comments

1reaction
ZwwWaynecommented, Jan 26, 2020

htc without semantic means the HTC will not do the semantic segmentation task, while default htc will also do semantic segmentation and use the segmentation features for instance segmentation. The bug means the target has an unexpected shape, you may check what the target looks like for the coco dataset and make the target of your own data has similar dimensions.

0reactions
IAMShashankkcommented, Jul 13, 2021

@prateek-77 i used the method mentioned in #1179 to create masked images like stuffthingmap. Still i am facing same error. Have you made any other changes except this. I have opened a new issue #5608 ; please have a look and let me know your inputs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error when Training on Custom dataset #94 - GitHub
Hi, Thanks for nice work, I tried to use it in my dataset, and follow all instructions on how to train the model...
Read more >
python 3.x - Training custom dataset in TensorFlow gives error
Train the model using my own dataset (train Images and test Images); Proper way of creating batches. Here is the complete code, so...
Read more >
Fine-tuning with custom datasets - Hugging Face
In this example, we'll show how to download, tokenize, and train a model on the IMDb reviews dataset. This task takes the text...
Read more >
Step-by-step instructions for training YOLOv7 on a Custom ...
Follow this guide to get step-by-step instructions for running YOLOv7 model training within a Gradient Notebook on a custom dataset.
Read more >
Internal server error when training custom speech model
I am trying to train a 20210831 + Audio file adaptation (Audio, Text, Pronunciation) baseline model using a dataset containing an audio file...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found