Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Discussion] How do we want to handle `torchvision.prototype.features.Feature`'s?

See original GitHub issue

This issue should spark a discussion about how we want to handle Feature’s in the future. There are a lot of open questions I’m trying to summarize. I’ll give my opinion to each of them. You can find the current implementation under torchvision.prototype.features.

What are `Feature`’s?

Feature’s are subclasses of torch.Tensor and their purpose is threefold:

With their type, e.g. Image, they information about the data they carry. The prototype transformations (torchvision.prototype.transforms) use this information to automatically dispatch an input to the correct kernel.
They can optionally carry additional meta data that might be needed for transforming the feature. For example, most geometric transformations can only be performed on bounding boxes if the size of the corresponding image is known.
They provide a convenient interface for feature specific functionality, for example transforming the format of a bounding box.

There are currently three Feature’s implemented

Image,
BoundingBox, and
Label,

but in the future we should add at least three more:

SemanticSegmentationMask,
InstanceSegementationMask, and
Video.

What is the policy of adding new `Feature`’s?

We could allow subclassing of Feature’s. On the one hand, this would make it easier for datasets to conveniently bundle meta data. For example, the COCO dataset could return a CocoLabel, which in addition to the default Label.category could also have the super_category field. On the other hand, this would also mean that the transforms need to handle subclasses of features well, for example a CocoLabel could be treated the same as a Label.

I see two downsides with that:

What if a transform needs the additional meta data carried by a feature subclass? Imagine I’ve added a special transformation that needs CocoLabel.super_category. Although from the surface this now supports plain Label’s this will fail at runtime.
Documentation custom features is more complicated than documenting a separate field in the sample dictionary of a dataset.

Thus, I’m leaning towards only having a few base classes.

From what data should a `Feature` be instantiable?

Some of the features like Image or Video have non-tensor objects that carry the data. Should these features know how to handle them? For example should something like Image(PIL.Image.open(...)) work?

My vote is out for yes. IMO this is very convenient and also not an unexpected semantic compared to passing the data directly, e.g. Image(torch.rand(3, 256, 256))

Should `Feature`’s have a fixed shape?

Consider the following table:

`Feature`	`.shape`
`Image`	`(*, C, H, W)`
`Label`	`(*)`
`BoundingBox`	`(*, 4)`
`SemanticSegmentationMask`	`(, H, W)` or `(, C, H, W)`
`InstanceSegementationMask`	`(*, N, H, W)`
`Video`	`(*, T, C, H, W)`

(For SemanticSegmentationMask I’m not sure about the shape yet. Having an extra channel dimension makes the tensor unnecessarily large, but it aligns well with segmentation image files, which are usually stored as RGB)

Should we fix the shape to a single feature, i.e. remove the * from the table above, or should we only care about the shape in the last dimensions to be correct?

My vote is out for having a flexible shape, since otherwise batching is not possible. For example, if we fix bounding boxes to shape (4,) a transformation would need to transform N bounding boxes individually, while for shape (N, 4) it could make use of parallelism.

On the same note, if we go for the flexible shape, do we keep the singular name of the feature? For example, do we regard a batch of images with shape (B, C, H, W) still as Image or should we go for the plural Images in general? My vote is out for always keeping the singular, since I’ve often seen something like:

for image, target in data_loader(dataset, batch_size=4):
    ...

Should `Feature`’s have a fixed dtype?

This makes sense for InstanceSegementationMask which should always be torch.bool. For all the other features I’m unsure. My gut says to use a default dtype, but also allow other dtypes.

What meta data should `Feature`’s carry?

IMO, this really depends on the decision above about the fixed / flexible shapes. If we go for fixed shapes, it can basically carry any information. If we go for flexible shapes instead, we should only have meta data, which is the same for batched features. For example, BoundingBox.image_size is fine, but Label.category is not.

What methods should `Feature`’s provide?

For now I’ve only included typical conversion methods, but of course this is not exhaustive.

`Feature`	method(s)
`Image`	`.to_dtype()`
	`.to_colorspace()`
`Label`	`.to_str()`
`BoundingBox`	`.to_format()`
`InstanceSegementationMask`	`.to_semantic()`

cc @bjuncek

Issue Analytics

State:
Created 2 years ago
Comments:19 (8 by maintainers)

Top GitHub Comments

2reactions

pmeiercommented, Feb 10, 2022

@vadimkantorov

I think we already have what you want. Citing @datumbox from https://github.com/pytorch/vision/issues/5045#issuecomment-1034814339

The Feature subclasses are not JIT-scritable. To address this we have two types of kernels: the high-level ones that can make use of meta-data (not JIT scriptable) and the low-level that actually perform the operations and receive the meta-data explicitly (JIT scriptable)

The low-level functions will work with vanilla tensors, so you don’t have to use the new abstractions if you don’t want. Tentative plan is to expose them from torchvision.transforms.kernels. Have a look at #5323 or the underlying branch https://github.com/pmeier/vision/tree/transforms/dispatch/torchvision/prototype/transforms/kernels.

1reaction

nairbvcommented, Feb 10, 2022

it’s quite cumbersome to mix these frameworks … it’s hard to use / copy only some parts of the framework for modification without buying into the whole thing … I think torchvision is uniquely positioned to drive simplification and reuse efforts

I think your comment gets at the motivation behind some of the questions I ask. e.g. I think it’s significant, for example, that we don’t force users to extend some kind of TorchvisionModule or LightningModule (though of course you can use torchvision models in lightning). Many vision frameworks already extend and build frameworks on top of torchvision, and we wouldn’t want to inhibit that, so need to be careful about our abstractions.

That also has to be balanced with what functionality we want to offer too though, e.g. it would be more convenient to be able to support non-RGB images (YUV from videos) without extra parameters on every image operator.

Top Results From Across the Web

Torchvision main documentation - PyTorch

Features described in this documentation are classified by release status: Stable: These features will be maintained long-term and there should generally be no ......

PyTorch Prototype Recipes

PyTorch Prototype Recipes. Prototype features are not available as part of binary distributions like PyPI or Conda (except maybe behind run-time flags).

PyTorch vs TensorFlow in 2022 - AssemblyAI

Should you use PyTorch vs TensorFlow in 2022? This guide walks through both popular frameworks, and when to choose PyTorch vs TensorFlow.

TorchVision Object Detection Finetuning Tutorial

For this tutorial, we will be finetuning a pre-trained Mask R-CNN model in the Penn-Fudan Database for Pedestrian Detection and Segmentation.

This Looks Like That, Because ... Explaining Prototypes ... - arXiv

of visual characteristics to explain why the classification model deemed an image ... modifications we need to effectively explain prototypes, we discuss.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

[Discussion] How do we want to handle `torchvision.prototype.features.Feature`'s?

What are `Feature`’s?

What is the policy of adding new `Feature`’s?

From what data should a `Feature` be instantiable?

Should `Feature`’s have a fixed shape?

Should `Feature`’s have a fixed dtype?

What meta data should `Feature`’s carry?