question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Sampling from custom object detection dataset

See original GitHub issue

Hi all. Thanks a lot for this library… I am really looking forward to integrating it into my workflow.

I am trying to sample from my own multiclass object detection dataset. I have a set of large raster images (geotiffs) and polygons in a geojson. All the source data is in EPSG:4326. I am using torchgeo v0.2.0.

Here are my dataset classes:

class MyImageDataset(RasterDataset):
    
    filename_glob = "**/*.tif"


class MyObjectDataset(VectorDataset):
    
    filename_glob = "*.geojson"

Then I instantiate the dataset like this:

imd = MyImageDataset(root=MY_IMAGE_ROOT)
objd = MyObjectDataset(root=MY_VECTORFILE_ROOT)

dataset = imd & objd

I am trying to sample from the image dataset, but only in spots where there is an object in my vector dataset. So, I combine the two datasets as above, since I believe that an IntersectionDataset is what I want.

Then I set up my sampler and dataloader:

sampler = RandomGeoSampler(dataset, size=0.002, length=1)
dl = DataLoader(dataset, sampler=sampler, collate_fn=stack_samples)

for sample in dl:
    image = sample['image']
    ...

However, I keep getting this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [156], in <cell line: 15>()
     12 sampler = RandomGeoSampler(dataset, size=0.002, length=1)
     13 dl = DataLoader(dataset, sampler=sampler, collate_fn=stack_samples)
---> 15 for sample in dl:
     16     image = sample['image']

...

File ~/torchgeo_test/venv-torchgeo/lib/python3.8/site-packages/rasterio/features.py:321, in rasterize(shapes, out_shape, fill, out, transform, all_touched, merge_alg, default_value, dtype)
    318         warnings.warn('Invalid or empty shape {} at index {} will not be rasterized.'.format(geom, index), ShapeSkipWarning)
    320 if not valid_shapes:
--> 321     raise ValueError('No valid geometry objects found for rasterize')
    323 shape_values = np.array(shape_values)
    325 if not validate_dtype(shape_values, valid_dtypes):

ValueError: No valid geometry objects found for rasterize

I have experimented with different values for size but I did a sanity check by looking at a sample from just the raster dataset at this size and everything looks good. (Note that I am on v0.2.0 and I don’t think the unit option is available for specifying pixel units.)

To troubleshoot, I have limited my image dataset to a single tiff and my vector dataset to a geojson with only 100 objects, some (but not all) of which fall on the image in my image dataset. I have also focused on one specific ROI on that image, a bounding box of size 1000 m^2 where I know there are objects located. I still receive the errors above.

I can index into the raster dataset using that ROI bounding box and get back the correct imagery, and I can index into the vector dataset using that ROI bounding box and get back the correct mask for my objects. Also, I can sample from the MyImageDataset raster dataset, but I get the same error as above when attempting to sampling from the MyObjectDataset vector dataset.

What am I doing wrong? Feel free to point me to any examples of setting up this type of dataset that I may have missed.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
evucommented, Mar 17, 2022

This makes sense. Thanks! The objects are sparsely distributed across the raster extent so I will probably look at implementing a class for this use case.

1reaction
adamjstewartcommented, Mar 17, 2022

The internal representation of a GeoDataset is an r-tree index. Each entry in the r-tree is the bounding box of a file, whether it is a raster or vector file. When you combine two GeoDatasets into an IntersectionDataset, we compute the intersection of these bounding boxes so that we only sample from regions within bounding boxes from both datasets.

Your problem probably stems from the fact that not all areas within the bounding box of the file may have shapes. For example, consider the following vector file with 3 features:

+---+
| A | d
+---+---+
| B | C |
+---+---+

If you sample from any of the areas where there is a feature (A, B, C), you’ll get what you expect. However, the bounding box of the vector file also contains region d where there are no features. This is what is happening in your dataset and why things were previously crashing.

One solution to this would be to store each shape as a separate entry in the r-tree index instead of storing things on a file-by-file basis. I haven’t thought about this too much because we don’t have a ton of VectorDatasets in TorchGeo yet, but that may work better than our current approach. This obviously wouldn’t work for raster files. I would have to see how well things work when the r-tree index gets large since some datasets like CBF have millions of shapes.

In the meantime, you can also specify a roi to the sampler to tell it to ignore regions with few or no shapes. This is basically just shrinking the bounding box of the dataset to something smaller. Hope that’s helpful!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Train a custom object detection model using your data
Learn how to train a custom object detection model for Raspberry Pi to detect less common objects like versions of a logo using...
Read more >
How to Prepare Data for Object Detection? | by Michal Lukac
Insights with building custom object detection models for large machine ... Object detection requires much bigger datasets than common image ...
Read more >
50+ Object Detection Datasets from different industry domains
A list of object detection and image segmentation datasets (With colab notebooks for training and inference) to explore and experiment with ...
Read more >
Custom Object Detection: Training and Inference - ImageAI
Download the pre-trained YOLOv3 model and the sample datasets in the link below ... To test the custom object detection, you can download...
Read more >
Custom Object detection using ImageAi with few steps - Medium
1- prepare your dataset. 2- Install imageAi. 3- Download pre trained weights ( yolo, RetinaNet ). sample object detection ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found