How to use a different amount of classes? (tried to look at other issues)
See original GitHub issue❓ Questions and Help
Hi there, I’ve been trying to get the repo to work with a new dataset (DDSM - mammography data), and I believe I’m close, but the final step is to actually use the correct amount of classes. I’ve modified the dataset to resemble the structure of COCO.
In the DDSM dataset, there are three classes (background, benign, and malignant). In order to try to get it to work, I followed the example in #166 (changed ROI_BOX_HEAD.NUM_CLASSES
to 3 and modified the Checkpointer
class). However, I’m still getting the following error:
2018-12-14 03:36:35,444 maskrcnn_benchmark.trainer INFO: Start training
start_iter 0
getting item 2491
classes: tensor([3])
self.json_category_id_to_contiguous_id: {0: 1, 1: 2, 2: 3}
/opt/conda/conda-bld/pytorch-nightly_1544606458595/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [31,0,0] Assertion `t >= 0 && t < n_classes` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch-nightly_1544606458595/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu line=111 error=59 : device-side assert triggered
getting item 2767
classes: tensor([3])
self.json_category_id_to_contiguous_id: {0: 1, 1: 2, 2: 3}
Traceback (most recent call last):
File "tools/train_net.py", line 169, in <module>
main()
File "tools/train_net.py", line 162, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 71, in train
arguments,
File "/scratch/jtb470/fb-mrcnn/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 82, in do_train
loss_dict = model(images, targets)
File "/home/jtb470/.conda/envs/cv-fb-mrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/scratch/jtb470/fb-mrcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 52, in forward
x, result, detector_losses = self.roi_heads(features, proposals, targets)
File "/home/jtb470/.conda/envs/cv-fb-mrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/scratch/jtb470/fb-mrcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 23, in forward
x, detections, loss_box = self.box(features, proposals, targets)
File "/home/jtb470/.conda/envs/cv-fb-mrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/scratch/jtb470/fb-mrcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py", line 55, in forward
[class_logits], [box_regression]
File "/scratch/jtb470/fb-mrcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/loss.py", line 139, in __call__
classification_loss = F.cross_entropy(class_logits, labels)
File "/home/jtb470/.conda/envs/cv-fb-mrcnn/lib/python3.7/site-packages/torch/nn/functional.py", line 1970, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/jtb470/.conda/envs/cv-fb-mrcnn/lib/python3.7/site-packages/torch/nn/functional.py", line 1790, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch-nightly_1544606458595/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu:111
I’ve tried looking at #15 and other issues and quite frankly I’m still lost as to what’s the right procedure for having a different amount of classes. What am I missing? What else do I need to do?
If it’s any help, this is my config file:
MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
BACKBONE:
CONV_BODY: "R-50-FPN"
OUT_CHANNELS: 256
RPN:
USE_FPN: True
ANCHOR_STRIDE: (4, 8, 16, 32, 64)
PRE_NMS_TOP_N_TRAIN: 2000
PRE_NMS_TOP_N_TEST: 1000
POST_NMS_TOP_N_TEST: 1000
FPN_POST_NMS_TOP_N_TEST: 1000
ROI_HEADS:
USE_FPN: True
ROI_BOX_HEAD:
POOLER_RESOLUTION: 7
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
POOLER_SAMPLING_RATIO: 2
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
PREDICTOR: "FPNPredictor"
NUM_CLASSES: 3
ROI_MASK_HEAD:
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor"
PREDICTOR: "MaskRCNNC4Predictor"
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 2
RESOLUTION: 28
SHARE_BOX_FEATURE_EXTRACTOR: False
MASK_ON: True
DATASETS:
TRAIN: ("ddsm_train",)
TEST: ("ddsm_val",)
DATALOADER:
NUM_WORKERS: 0
SIZE_DIVISIBILITY: 32
SOLVER:
BASE_LR: 0.0025
WEIGHT_DECAY: 0.0001
STEPS: (60000, 80000)
MAX_ITER: 90000
IMS_PER_BATCH: 2
TEST:
IMS_PER_BATCH: 2
Thank you so much in advance.
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (4 by maintainers)
Top GitHub Comments
@fmassa thank you for your response. Indeed my issue was that I did not know I had to count the background class for the config setting, so “ROI_BOX_HEAD.NUM_CLASSES” had to be 5. Issue #297 helped me realize that!
I also had the mistake of not deleting the previous checkpoint (deleting the output folder after testing with 81 classes), so it was loading that, instead of creating a new one.
Thanks for the help!
Sincerely thanks for your suggestion! : )