question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ValueError during loading of dataset

See original GitHub issue

❓ Questions and Help

Hi I tried loading a custom dataset in coco format and followed the descriptions regarding the symlink for the dataset. As a background, I labelled the images on Labelbox, exported the annotation file in JSON WKT format, downloaded the images saved as URL in the annotation file, split the annotation file and the images into train, validation and test data and converted the annotation files to JSON COCO format. For using maskrcnn_benchmark, I configured my own .yaml file so that it has the correct paths:

...
DATASETS:
  TRAIN: ("coco_nuclei_train", "coco_nuclei_val")
  TEST: ("coco_nuclei_test",)
...

I also configured my own paths_catalog.py file, where I specified the correct paths to the symlinks in maskrcnn-benchmark/datasets/coco_nuclei which looks like this:

...
DATASETS = {
        "coco_nuclei_train": { 
            "img_dir": "coco_nuclei/train",
            "ann_file":	"coco_nuclei/annotation/train_coco.json"
        },
        "coco_nuclei_val": { 
            "img_dir": "coco_nuclei/train",
            "ann_file":	"coco_nuclei/annotation/val_coco.json"
        },
        "coco_nuclei_test": { 
            "img_dir": "coco_nuclei/test",
            "ann_file":	"coco_nuclei/annotation/test_coco.json"
        }
    }
...

Still when I run the training, it reads in the annotation files without problem but apparently cannot load the training/validation data as it raises a ValueError:

loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
loading annotations into memory...
Done (t=0.03s)
creating index...
index created!
[<maskrcnn_benchmark.data.datasets.concat_dataset.ConcatDataset object at 0x2acd4fd43160>]
Traceback (most recent call last):
  File "tools/train_net.py", line 172, in <module>
    main()
  File "tools/train_net.py", line 165, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 60, in train
    start_iter=arguments["iteration"],
  File "/home/max/github/maskrcnn-benchmark/maskrcnn_benchmark/data/build.py", line 159, in make_data_loader
    sampler = make_data_sampler(dataset, shuffle, is_distributed)
  File "/home/max/github/maskrcnn-benchmark/maskrcnn_benchmark/data/build.py", line 63, in make_data_sampler
    sampler = torch.utils.data.sampler.RandomSampler(dataset)
  File "/home/max/anaconda3/envs/pt_mask_Rcnn_env/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 64, in __init__
    "value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integeral value, but got num_samples=0

I checked the paths multiple times, but maybe this issue occurred with someone else as well, help will be MUCH appreciated! If I am missing important information please let me know. Thank you in advance!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
fmassacommented, Feb 6, 2019

I believe that the dataset that you loaded might end up being empty.

Here is what I’d recommend: add a

from IPython import embed; embed()

in this part of the code https://github.com/facebookresearch/maskrcnn-benchmark/blob/d3fed42afe59910dd650308c426640d183d044b5/maskrcnn_benchmark/data/build.py#L46 and check that the dataset that is returned there is actually valid. For example, check that len(datasets[0]) is what you would expect, and also that datasets[0][0] returns the image and the annotation in the BoxList format properly. Also, do this without torch.distributed.launch, i.e., in a single GPU.

My guess is that your dataset is empty for some reason.

0reactions
maxsenhcommented, Feb 15, 2019

Sure, so what I did is I downloaded the annotation file in JSON WKT format and then used the tool labelbox2coco to convert the JSON file into coco format. Then I changed in ‘images’ per image the value of the ‘file_name’ to the correct filename and removed the URL given by Labelbox (I downloaded all the images first) and changed ‘id’ to an integer corresponding to the filename. Then I changed per image in ‘annotations’ the ‘image_id’ to the integer corresponding to the filename. It is maybe not the best way but it worked! Hope it helps

Read more comments on GitHub >

github_iconTop Results From Across the Web

Huggingface datasets ValueError - python - Stack Overflow
I solved this error by streaming the dataset. from datasets import load_dataset dataset = load_dataset("datasetFile", use_auth_token=True, ...
Read more >
Loading a Dataset — datasets 1.1.2 documentation
To load a dataset from the Hub we use the datasets.load_dataset() command and give it the short name of the dataset you would...
Read more >
ValueError when loading expression matrix into scanPy
ValueError when loading expression matrix into scanPy. I tried loading the data in R with the Seurat package, which worked after appending one...
Read more >
matrixprofile.datasets.datasets
Raises ------ ValueError: When a category is provided, but is not found in the listing. """ # download the file and load it...
Read more >
ValueError: too many values to unpack (expected 2)
This error occurs when the number of variables doesn't match the number of values. As a result of the inequality, Python doesn't know...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found