ValueError during loading of dataset
See original GitHub issue❓ Questions and Help
Hi I tried loading a custom dataset in coco format and followed the descriptions regarding the symlink for the dataset. As a background, I labelled the images on Labelbox, exported the annotation file in JSON WKT format, downloaded the images saved as URL in the annotation file, split the annotation file and the images into train, validation and test data and converted the annotation files to JSON COCO format. For using maskrcnn_benchmark, I configured my own .yaml file so that it has the correct paths:
...
DATASETS:
TRAIN: ("coco_nuclei_train", "coco_nuclei_val")
TEST: ("coco_nuclei_test",)
...
I also configured my own paths_catalog.py file, where I specified the correct paths to the symlinks in maskrcnn-benchmark/datasets/coco_nuclei which looks like this:
...
DATASETS = {
"coco_nuclei_train": {
"img_dir": "coco_nuclei/train",
"ann_file": "coco_nuclei/annotation/train_coco.json"
},
"coco_nuclei_val": {
"img_dir": "coco_nuclei/train",
"ann_file": "coco_nuclei/annotation/val_coco.json"
},
"coco_nuclei_test": {
"img_dir": "coco_nuclei/test",
"ann_file": "coco_nuclei/annotation/test_coco.json"
}
}
...
Still when I run the training, it reads in the annotation files without problem but apparently cannot load the training/validation data as it raises a ValueError:
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
loading annotations into memory...
Done (t=0.03s)
creating index...
index created!
[<maskrcnn_benchmark.data.datasets.concat_dataset.ConcatDataset object at 0x2acd4fd43160>]
Traceback (most recent call last):
File "tools/train_net.py", line 172, in <module>
main()
File "tools/train_net.py", line 165, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 60, in train
start_iter=arguments["iteration"],
File "/home/max/github/maskrcnn-benchmark/maskrcnn_benchmark/data/build.py", line 159, in make_data_loader
sampler = make_data_sampler(dataset, shuffle, is_distributed)
File "/home/max/github/maskrcnn-benchmark/maskrcnn_benchmark/data/build.py", line 63, in make_data_sampler
sampler = torch.utils.data.sampler.RandomSampler(dataset)
File "/home/max/anaconda3/envs/pt_mask_Rcnn_env/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 64, in __init__
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integeral value, but got num_samples=0
I checked the paths multiple times, but maybe this issue occurred with someone else as well, help will be MUCH appreciated! If I am missing important information please let me know. Thank you in advance!
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
I believe that the dataset that you loaded might end up being empty.
Here is what I’d recommend: add a
in this part of the code https://github.com/facebookresearch/maskrcnn-benchmark/blob/d3fed42afe59910dd650308c426640d183d044b5/maskrcnn_benchmark/data/build.py#L46 and check that the dataset that is returned there is actually valid. For example, check that
len(datasets[0])
is what you would expect, and also thatdatasets[0][0]
returns the image and the annotation in theBoxList
format properly. Also, do this withouttorch.distributed.launch
, i.e., in a single GPU.My guess is that your dataset is empty for some reason.
Sure, so what I did is I downloaded the annotation file in JSON WKT format and then used the tool labelbox2coco to convert the JSON file into coco format. Then I changed in ‘images’ per image the value of the ‘file_name’ to the correct filename and removed the URL given by Labelbox (I downloaded all the images first) and changed ‘id’ to an integer corresponding to the filename. Then I changed per image in ‘annotations’ the ‘image_id’ to the integer corresponding to the filename. It is maybe not the best way but it worked! Hope it helps