Clarify transformations for image models at inference time
See original GitHub issueHello!
I think it might be helpful to clarify the transformations that images go through in the docs, and maybe provide a public method that encapsulates that. Here’s my current understanding.
1. Types
You can pass a few different types:
export type ClassifierInputSource = HTMLImageElement | HTMLCanvasElement | HTMLVideoElement | ImageBitmap;
2. cropTo
This image data is copied into a new canvas, cropped with cropTo. This sizes to 224x224, and uses a strategy like “cover”, resizing the image to be at least 224x224 and then cropping from the center.
3. capture
The call to capture grabs the pixels from the image, and then crops that tensor with cropTensor. This crop enforces that the image is square, but here it doesn’t do anything, since the image itself has already been cropped to be square in cropTo
. Finally it normalizes the values in RGB space to [-1,1] here.
4.Transparency
It also seems like fully transparent pixels might be translated to rgb(0,0,0) as well. That happened in one example image I tried, but I didn’t look further.
Is that capturing it? These scaling, cropping and color changes seem like they would be important for callers (or users) to be aware of.
Exposing as a function
I think ideally this library would also expose any pre-processing for callers to use as well. That way tools using this can use the same pre-processing as well. Otherwise, if you made a tool that visualized the images the model predicts you might naively render the input image (which isn’t actually what the TM model sees), or analyze how the TM model compares to other models (without using the same pre-processing step). Concretely, one suggestion there would be to expose something like:
model.preprocess(image: ClassifierInputSource)
Returns a Tensor representing the image, after applying any transformations that the model applies to an input (eg, scaling, cropping or normalizing). The specifics of the particular transformations are part of the internals of the library and subject to breaking changes, but this public method will be stable.
Args:
- image: an image, canvas, or video element to make a classification on
Usage:
const img = new Image();
img.src = '...'; // some image that is larger than 224x224px and not square and has some transparency
img.onload = async () => {
const tensor = await model.preprocess(img);
const canvas = document.createElement('canvas');
await tf.browser.toPixels(tensor, canvas);
document.body.appendChild(canvas);
}
document.body.appendChild(img);
original
pre-processed
(note also the color in the background, which I’m assuming was introduced by translating from [0,255] => [-1,1] => [0, 255] but didn’t look further)
Thanks for sharing this awesome work 😄
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
Thanks for sharing Kevin! We’ll look into this as soon a bit of work clears up. Some great thoughts and suggestions in here.
@kevinrobinson Well, yes I am experiencing this issue. I build an image classifier but at inference I am not what sort of processing to apply to the input image. I know the images using in training are cropped to a square, but nothing about the dimension, scaling. I wish those information could be made public so I could implement them in my javascript code.