question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Discussion] How do we want to handle `torchvision.prototype.features.Feature`'s?

See original GitHub issue

This issue should spark a discussion about how we want to handle Feature’s in the future. There are a lot of open questions I’m trying to summarize. I’ll give my opinion to each of them. You can find the current implementation under torchvision.prototype.features.

What are Feature’s?

Feature’s are subclasses of torch.Tensor and their purpose is threefold:

  1. With their type, e.g. Image, they information about the data they carry. The prototype transformations (torchvision.prototype.transforms) use this information to automatically dispatch an input to the correct kernel.
  2. They can optionally carry additional meta data that might be needed for transforming the feature. For example, most geometric transformations can only be performed on bounding boxes if the size of the corresponding image is known.
  3. They provide a convenient interface for feature specific functionality, for example transforming the format of a bounding box.

There are currently three Feature’s implemented

  • Image,
  • BoundingBox, and
  • Label,

but in the future we should add at least three more:

  • SemanticSegmentationMask,
  • InstanceSegementationMask, and
  • Video.

What is the policy of adding new Feature’s?

We could allow subclassing of Feature’s. On the one hand, this would make it easier for datasets to conveniently bundle meta data. For example, the COCO dataset could return a CocoLabel, which in addition to the default Label.category could also have the super_category field. On the other hand, this would also mean that the transforms need to handle subclasses of features well, for example a CocoLabel could be treated the same as a Label.

I see two downsides with that:

  1. What if a transform needs the additional meta data carried by a feature subclass? Imagine I’ve added a special transformation that needs CocoLabel.super_category. Although from the surface this now supports plain Label’s this will fail at runtime.
  2. Documentation custom features is more complicated than documenting a separate field in the sample dictionary of a dataset.

Thus, I’m leaning towards only having a few base classes.

From what data should a Feature be instantiable?

Some of the features like Image or Video have non-tensor objects that carry the data. Should these features know how to handle them? For example should something like Image(PIL.Image.open(...)) work?

My vote is out for yes. IMO this is very convenient and also not an unexpected semantic compared to passing the data directly, e.g. Image(torch.rand(3, 256, 256))

Should Feature’s have a fixed shape?

Consider the following table:

Feature .shape
Image (*, C, H, W)
Label (*)
BoundingBox (*, 4)
SemanticSegmentationMask (*, H, W) or (*, C, H, W)
InstanceSegementationMask (*, N, H, W)
Video (*, T, C, H, W)

(For SemanticSegmentationMask I’m not sure about the shape yet. Having an extra channel dimension makes the tensor unnecessarily large, but it aligns well with segmentation image files, which are usually stored as RGB)

Should we fix the shape to a single feature, i.e. remove the * from the table above, or should we only care about the shape in the last dimensions to be correct?

My vote is out for having a flexible shape, since otherwise batching is not possible. For example, if we fix bounding boxes to shape (4,) a transformation would need to transform N bounding boxes individually, while for shape (N, 4) it could make use of parallelism.

On the same note, if we go for the flexible shape, do we keep the singular name of the feature? For example, do we regard a batch of images with shape (B, C, H, W) still as Image or should we go for the plural Images in general? My vote is out for always keeping the singular, since I’ve often seen something like:

for image, target in data_loader(dataset, batch_size=4):
    ...

Should Feature’s have a fixed dtype?

This makes sense for InstanceSegementationMask which should always be torch.bool. For all the other features I’m unsure. My gut says to use a default dtype, but also allow other dtypes.

What meta data should Feature’s carry?

IMO, this really depends on the decision above about the fixed / flexible shapes. If we go for fixed shapes, it can basically carry any information. If we go for flexible shapes instead, we should only have meta data, which is the same for batched features. For example, BoundingBox.image_size is fine, but Label.category is not.

What methods should Feature’s provide?

For now I’ve only included typical conversion methods, but of course this is not exhaustive.

Feature method(s)
Image .to_dtype()
.to_colorspace()
Label .to_str()
BoundingBox .to_format()
InstanceSegementationMask .to_semantic()

cc @bjuncek

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:19 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
pmeiercommented, Feb 10, 2022

@vadimkantorov

I think we already have what you want. Citing @datumbox from https://github.com/pytorch/vision/issues/5045#issuecomment-1034814339

The Feature subclasses are not JIT-scritable. To address this we have two types of kernels: the high-level ones that can make use of meta-data (not JIT scriptable) and the low-level that actually perform the operations and receive the meta-data explicitly (JIT scriptable)

The low-level functions will work with vanilla tensors, so you don’t have to use the new abstractions if you don’t want. Tentative plan is to expose them from torchvision.transforms.kernels. Have a look at #5323 or the underlying branch https://github.com/pmeier/vision/tree/transforms/dispatch/torchvision/prototype/transforms/kernels.

1reaction
nairbvcommented, Feb 10, 2022

it’s quite cumbersome to mix these frameworks … it’s hard to use / copy only some parts of the framework for modification without buying into the whole thing … I think torchvision is uniquely positioned to drive simplification and reuse efforts

I think your comment gets at the motivation behind some of the questions I ask. e.g. I think it’s significant, for example, that we don’t force users to extend some kind of TorchvisionModule or LightningModule (though of course you can use torchvision models in lightning). Many vision frameworks already extend and build frameworks on top of torchvision, and we wouldn’t want to inhibit that, so need to be careful about our abstractions.

That also has to be balanced with what functionality we want to offer too though, e.g. it would be more convenient to be able to support non-RGB images (YUV from videos) without extra parameters on every image operator.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Torchvision main documentation - PyTorch
Features described in this documentation are classified by release status: Stable: These features will be maintained long-term and there should generally be no ......
Read more >
PyTorch Prototype Recipes
PyTorch Prototype Recipes. Prototype features are not available as part of binary distributions like PyPI or Conda (except maybe behind run-time flags).
Read more >
PyTorch vs TensorFlow in 2022 - AssemblyAI
Should you use PyTorch vs TensorFlow in 2022? This guide walks through both popular frameworks, and when to choose PyTorch vs TensorFlow.
Read more >
TorchVision Object Detection Finetuning Tutorial
For this tutorial, we will be finetuning a pre-trained Mask R-CNN model in the Penn-Fudan Database for Pedestrian Detection and Segmentation.
Read more >
This Looks Like That, Because ... Explaining Prototypes ... - arXiv
of visual characteristics to explain why the classification model deemed an image ... modifications we need to effectively explain prototypes, we discuss.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found