Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use a different amount of classes? (tried to look at other issues)

See original GitHub issue

❓ Questions and Help

Hi there, I’ve been trying to get the repo to work with a new dataset (DDSM - mammography data), and I believe I’m close, but the final step is to actually use the correct amount of classes. I’ve modified the dataset to resemble the structure of COCO.

In the DDSM dataset, there are three classes (background, benign, and malignant). In order to try to get it to work, I followed the example in #166 (changed ROI_BOX_HEAD.NUM_CLASSES to 3 and modified the Checkpointer class). However, I’m still getting the following error:

2018-12-14 03:36:35,444 maskrcnn_benchmark.trainer INFO: Start training
start_iter 0
getting item 2491
classes: tensor([3])
self.json_category_id_to_contiguous_id: {0: 1, 1: 2, 2: 3}
/opt/conda/conda-bld/pytorch-nightly_1544606458595/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [31,0,0] Assertion `t >= 0 && t < n_classes` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch-nightly_1544606458595/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu line=111 error=59 : device-side assert triggered
getting item 2767
classes: tensor([3])
self.json_category_id_to_contiguous_id: {0: 1, 1: 2, 2: 3}
Traceback (most recent call last):
  File "tools/train_net.py", line 169, in <module>
    main()
  File "tools/train_net.py", line 162, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 71, in train
    arguments,
  File "/scratch/jtb470/fb-mrcnn/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 82, in do_train
    loss_dict = model(images, targets)
  File "/home/jtb470/.conda/envs/cv-fb-mrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch/jtb470/fb-mrcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 52, in forward
    x, result, detector_losses = self.roi_heads(features, proposals, targets)
  File "/home/jtb470/.conda/envs/cv-fb-mrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch/jtb470/fb-mrcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 23, in forward
    x, detections, loss_box = self.box(features, proposals, targets)
  File "/home/jtb470/.conda/envs/cv-fb-mrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch/jtb470/fb-mrcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py", line 55, in forward
    [class_logits], [box_regression]
  File "/scratch/jtb470/fb-mrcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/loss.py", line 139, in __call__
    classification_loss = F.cross_entropy(class_logits, labels)
  File "/home/jtb470/.conda/envs/cv-fb-mrcnn/lib/python3.7/site-packages/torch/nn/functional.py", line 1970, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/jtb470/.conda/envs/cv-fb-mrcnn/lib/python3.7/site-packages/torch/nn/functional.py", line 1790, in nll_loss
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch-nightly_1544606458595/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu:111

I’ve tried looking at #15 and other issues and quite frankly I’m still lost as to what’s the right procedure for having a different amount of classes. What am I missing? What else do I need to do?

If it’s any help, this is my config file:

MODEL:
  META_ARCHITECTURE: "GeneralizedRCNN"
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
    OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)
    PRE_NMS_TOP_N_TRAIN: 2000
    PRE_NMS_TOP_N_TEST: 1000
    POST_NMS_TOP_N_TEST: 1000
    FPN_POST_NMS_TOP_N_TEST: 1000
  ROI_HEADS:
    USE_FPN: True
  ROI_BOX_HEAD:
    POOLER_RESOLUTION: 7
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    POOLER_SAMPLING_RATIO: 2
    FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
    PREDICTOR: "FPNPredictor"
    NUM_CLASSES: 3
  ROI_MASK_HEAD:
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor"
    PREDICTOR: "MaskRCNNC4Predictor"
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 2
    RESOLUTION: 28
    SHARE_BOX_FEATURE_EXTRACTOR: False
  MASK_ON: True
DATASETS:
  TRAIN: ("ddsm_train",)
  TEST: ("ddsm_val",)
DATALOADER:
  NUM_WORKERS: 0
  SIZE_DIVISIBILITY: 32
SOLVER:
  BASE_LR: 0.0025
  WEIGHT_DECAY: 0.0001
  STEPS: (60000, 80000)
  MAX_ITER: 90000
  IMS_PER_BATCH: 2
TEST:
  IMS_PER_BATCH: 2

Thank you so much in advance.

Issue Analytics

State:
Created 5 years ago
Comments:10 (4 by maintainers)

Top GitHub Comments

2reactions

adrifloresmcommented, Jan 7, 2019

@fmassa thank you for your response. Indeed my issue was that I did not know I had to count the background class for the config setting, so “ROI_BOX_HEAD.NUM_CLASSES” had to be 5. Issue #297 helped me realize that!

I also had the mistake of not deleting the previous checkpoint (deleting the output folder after testing with 81 classes), so it was loading that, instead of creating a new one.

Thanks for the help!

1reaction

BobZhangHTcommented, Jan 23, 2019

Sincerely thanks for your suggestion! : )

Top Results From Across the Web

8 Tactics to Combat Imbalanced Classes in Your Machine ...

You will very likely have to try a variety of penalty schemes and see what works best for your problem. 7) Try a...

Response: Ways to Handle a Class That Has Gotten Out-of ...

The solution is to try something different from what you normally do. For example, if your teaching style is easygoing, try tightening up ......

Studying 101: Study Smarter Not Harder - UNC Learning Center

When preparing for tests, put together a large list of problems from the course materials and lectures. Work the problems and explain the...

Discussions - Eberly Center - Carnegie Mellon University

Evaluate the discussion There are a number of ways to evaluate discussions. For example, immediately following the discussion, you might ask students to...

Registration Errors - Office of the Registrar - UC Merced

You should try to register for another open section of the same course, complete a Time Conflict Override Form, or register for a...