Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ValueError during loading of dataset

See original GitHub issue

❓ Questions and Help

Hi I tried loading a custom dataset in coco format and followed the descriptions regarding the symlink for the dataset. As a background, I labelled the images on Labelbox, exported the annotation file in JSON WKT format, downloaded the images saved as URL in the annotation file, split the annotation file and the images into train, validation and test data and converted the annotation files to JSON COCO format. For using maskrcnn_benchmark, I configured my own .yaml file so that it has the correct paths:

...
DATASETS:
  TRAIN: ("coco_nuclei_train", "coco_nuclei_val")
  TEST: ("coco_nuclei_test",)
...

I also configured my own paths_catalog.py file, where I specified the correct paths to the symlinks in maskrcnn-benchmark/datasets/coco_nuclei which looks like this:

...
DATASETS = {
        "coco_nuclei_train": { 
            "img_dir": "coco_nuclei/train",
            "ann_file":	"coco_nuclei/annotation/train_coco.json"
        },
        "coco_nuclei_val": { 
            "img_dir": "coco_nuclei/train",
            "ann_file":	"coco_nuclei/annotation/val_coco.json"
        },
        "coco_nuclei_test": { 
            "img_dir": "coco_nuclei/test",
            "ann_file":	"coco_nuclei/annotation/test_coco.json"
        }
    }
...

Still when I run the training, it reads in the annotation files without problem but apparently cannot load the training/validation data as it raises a ValueError:

loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
loading annotations into memory...
Done (t=0.03s)
creating index...
index created!
[<maskrcnn_benchmark.data.datasets.concat_dataset.ConcatDataset object at 0x2acd4fd43160>]
Traceback (most recent call last):
  File "tools/train_net.py", line 172, in <module>
    main()
  File "tools/train_net.py", line 165, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 60, in train
    start_iter=arguments["iteration"],
  File "/home/max/github/maskrcnn-benchmark/maskrcnn_benchmark/data/build.py", line 159, in make_data_loader
    sampler = make_data_sampler(dataset, shuffle, is_distributed)
  File "/home/max/github/maskrcnn-benchmark/maskrcnn_benchmark/data/build.py", line 63, in make_data_sampler
    sampler = torch.utils.data.sampler.RandomSampler(dataset)
  File "/home/max/anaconda3/envs/pt_mask_Rcnn_env/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 64, in __init__
    "value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integeral value, but got num_samples=0

I checked the paths multiple times, but maybe this issue occurred with someone else as well, help will be MUCH appreciated! If I am missing important information please let me know. Thank you in advance!

Issue Analytics

State:
Created 5 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

2reactions

fmassacommented, Feb 6, 2019

I believe that the dataset that you loaded might end up being empty.

Here is what I’d recommend: add a

from IPython import embed; embed()

in this part of the code https://github.com/facebookresearch/maskrcnn-benchmark/blob/d3fed42afe59910dd650308c426640d183d044b5/maskrcnn_benchmark/data/build.py#L46 and check that the dataset that is returned there is actually valid. For example, check that len(datasets[0]) is what you would expect, and also that datasets[0][0] returns the image and the annotation in the BoxList format properly. Also, do this without torch.distributed.launch, i.e., in a single GPU.

My guess is that your dataset is empty for some reason.

0reactions

maxsenhcommented, Feb 15, 2019

Sure, so what I did is I downloaded the annotation file in JSON WKT format and then used the tool labelbox2coco to convert the JSON file into coco format. Then I changed in ‘images’ per image the value of the ‘file_name’ to the correct filename and removed the URL given by Labelbox (I downloaded all the images first) and changed ‘id’ to an integer corresponding to the filename. Then I changed per image in ‘annotations’ the ‘image_id’ to the integer corresponding to the filename. It is maybe not the best way but it worked! Hope it helps